Molloy Is Designing Efficient Algorithms for Reconstructing Evolutionary Trees

Descriptive image for Molloy Is Designing Efficient Algorithms for Reconstructing Evolutionary Trees

Perhaps the most iconic image in evolutionary biology is Charles Darwin's sketch of an evolutionary tree. The illustration highlights Darwin’s transformational idea that the evolutionary relationships among species can be depicted through a branching pattern, a concept known as the Tree of Life.

While Darwin primarily relied on the physical characteristics shared by subsets of species to determine an evolutionary tree’s structure, scientists today are using vast amounts of genomic data to reconstruct evolutionary trees—a field known as phylogenetics.

Erin Molloy, who joins the University of Maryland on July 1 as an assistant professor in the Department of Computer Science, is part of this new genomic revolution.

She is using powerful computational tools to unlock the full breadth of information available in genomic data, designing efficient algorithms to estimate evolutionary trees. This type of information is vital for determining the evolutionary history of birds, plants and even microbes, such as SARS-CoV-2, the virus which causes COVID-19.

“The main goal is to estimate the tree—and other parameters—given the observed genomic data,” says Molloy, who will also hold an appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS). “The resulting phylogeny is not only interesting in its own right, but it is also important for downstream analyses.”

For example, she notes that the phylogeny for SARS-CoV-2 is not only useful for studying how the virus evolves and how new strains emerge, but also for strain identification—that is, determining which strains are present in a sample—and even contact tracing.

Molloy recently completed a year-long postdoctoral researcher position in the Machine Learning and Genomics Lab at the University of California, Los Angeles.

She says she is looking forward to expanding her research agenda at Maryland, where she can take advantage of UMIACS’ vast computational resources.

A major line of Molloy’s research focuses on the development of phylogeny estimation methods that can effectively utilize distributed-memory systems.

In this context, she says, the genomic data set is distributed across multiple processors, and the algorithm may require these processors to communicate with each other.

“In the worst case, the processors must synchronize with each other at specific points in the algorithm, Molloy explains. “All of this dramatically slows down the computation. My goal is to design methods that reduce communication bottlenecks, while achieving the same accuracy and statistical guarantees of existing methods.”

Molloy says she looks forward to working with graduate students and faculty at Maryland, particularly within the Center for Bioinformatics and Computational Biology.

She has previously collaborated with Mihai Pop, the director of UMIACS, on a project that utilizes estimated phylogenies to perform taxon identification and abundance profiling from metagenomics data sets.

“I hope to continue working with Mihai on problems in metagenomics, where modeling evolutionary processes could prove advantageous,” Molloy says.

Other UMIACS faculty she expects to collaborate with include Brantley Hall, who works on identifying the functions of the genes in the microbiome, and Michael Cummings, who also approaches phylogeny estimation from a high-performance computing lens.

Although Cummings and Molloy utilize different methodologies for phylogeny estimation, Molloy says working together could lead to new approaches.

“There is a lot of potential for creating a hybrid method that combines aspects of our different approaches,” she says. “I look forward to discussing these ideas with my new colleagues at the University of Maryland.”

—Story by Melissa Brachfeld

The Department welcomes comments, suggestions and corrections.  Send email to editor [-at-] cs [dot] umd [dot] edu.