Supertree-like Methods for Advancing Evolutionary Genomic Biology
An "unlimited thirst for genome sequencing" is driving research in many domains. Evolutionary genomic biology is no exception, as demonstrated by the 10,000 Plant Genomes Project, the (70,000) Vertebrate Genomes Project, and the Earth BioGenomes Project (which aims to assemble genomes for all living species on Earth). A goal for these ultra-large datasets is to enable researchers to address fundamental questions, such as how do species evolve/adapt to their environments and how is biodiversity created/maintained. But to transform these data into scientific insights, computational advances are needed. Estimating evolutionary trees is a key step in many research studies; however, many of the current leading methods are heuristics for NP-hard optimization problems, and the time required to run such methods on large datasets can be prohibitive. In this talk, I will present my recent work to address this challenge through the development of three new supertree-like methods. All of these methods run in polynomial time, enable provably statistically consistent phylogeny (evolutionary tree) estimation, and achieve similar accuracy to the current leading methods, while dramatically reducing memory usage and running time. I will also address open challenges and future work.