Theory and practice for phylogenomic analyses of SNP-like markers

Talk
Erin Molloy
Talk Series: 
Time: 
09.03.2021 11:00 to 12:00
Location: 

IRB 0318

Also on Zoom: https://umd.zoom.us/j/96718034173?pwd=clNJRks5SzNUcGVxYmxkcVJGNDB4dz09 An "unlimited thirst for genome sequencing" is driving research in many domains. Evolutionary genomic biology is no exception, as demonstrated by the 10,000 Plant Genomes Project, the (60,000) Vertebrate Genomes Project, and the Earth BioGenome Project, which aims to assemble 1.5 eukaryotic genomes in the next 10 years. A goal for these ultra-large datasets is to enable researchers to address fundamental questions, such as how do species evolve/adapt to their environments and how is biodiversity created/maintained. Estimating evolutionary histories is a key step in many research studies. In this talk, we will focus on recent methodological advances for estimating evolutionary trees and networks (admixture graphs) from SNP-like markers, that is, markers that can be modeled under the neutral Wright-Fischer + infinite-sites model. In the first half of this talk, I will present two new quartet-based methods for species tree estimation. These methods are statistically consistent and outperform traditional parsimony-based methods, especially when the species tree is in the anomaly zone. Furthermore, the utilization of quartets enables efficient estimation of branch lengths and support values. In the second half of this talk, we will turn our attention to admixture graphs, specifically the popular estimation method TreeMix, which operates by computing an evolutionary tree and then augmenting it with admixture (or gene flow) edges in an iterative fashion. As I will show, TreeMix and related methods are guaranteed to get stuck in a local optimum and return an incorrect network topology for even a simple model with one admixed population incident to a leaf. This motivates the introduction (and evaluation) of a new graph search strategy, referred to as maximum likelihood network orientation (MLNO). Overall, these results provide insights into the performance of existing methods and suggest future directions for research.