Finding needles in genomic haystacks: achieving single-nucleotide resolution via accurate and efficient multiple sequence alignment
The DNA data deluge is upon us. Hundreds of thousands of draft and complete genomes comprising closely-related pathogen strains are now available from public databases. These data provide us with an unprecedented opportunity to track genome evolution across all domains of life and trace the spread of infectious disease. Multiple sequence alignment has proven to be a versatile tool for global and local comparison of DNA sequences. However, multiple sequence alignment under practical scoring schemes requires O (n^k) for k sequences of length n, making multiple sequence alignment (for both large n and large k) an impossible task using traditional techniques. This talk will highlight computational strategies and new data structures to circumvent this bottleneck in the pursuit of achieving single-nucleotide resolution for multiple genome comparison and ab initio repeat family detection. I will conclude with emerging research opportunities in genomics framed by the contributions highlighted in this talk.