PhD Proposal: Reference-guided Metagenomic Assembly

Victoria Cepeda-Espinoza
08.25.2017 12:00 to 13:30
CBCB 3118

Microorganisms play an important role in all of the Earth's ecosystems and are critical for the health of humans, plants, and animals. Most microbes are not easily grown in a laboratory. The analysis of organismal DNA sequences obtained directly from an environmental sample (metagenomics), enables the study of microorganisms that are not easily cultured.
Metagenomic studies have exploded in recent years due to the increased availability of inexpensive high-throughput sequencing technologies. Thousands of bacterial genomes have been sequenced and the number is expected to grow rapidly in the next few years. These sequenced genomes provide a great resource for performing reference-guided assembly of metagenomic sequences. While database driven approaches have been employed in certain analyses, they have not been used in the assembly of metagenomic data. This is in part due to the small size and biased coverage of public genome databases, but also due to the inherent computational cost of mapping tens of millions of reads to thousands of full genome sequences.
In this prospectus, I develop reference-guided computational methods to recruit and assemble metagenomic sequences. I describe MetaCompass, the first assembly software package for the reference-guided assembly of metagenomic data. We use an indexing strategy to quickly construct sample-specific reference collections and show that this approach effectively complements de novo (non-reference-guided) assembly methods. Specifically, we show that the combination of comparative and de novo assembly approaches boosts the contiguity and completeness of metagenomic assembly using data generated as part of the Human Microbiome Project.
Examining Committee:
Chair: Dr. Mihai Pop
Dept rep: Jordan Boyd-Graber
Member: Dr. Hector Corrada Bravo