- Instructor: Erin Molloy
Meeting Info for Fall 2022:
- Class convenues Tuesdays + Thursdays 11am-12:15pm
- Drop-by hours are Mondays 4-6pm
- Post to CampusWire (like Piazza but better) for public matters, for example questions on course material, assignments, exams, etc.
- Send message to instructors on ELMS for personal matters.
Target Audience and Prerequisites
CMSC82A is a computer science MS/PhD qualifying course in bioinformatics. (Search CMSC829A under CS grad coursework.) The target audience for this course is graduate students from the CS, ECE, AMSC, and statistics. No prior knowledge of biology is a required! Familiarity with algorithms, probability, and basic statistics is expected. In addition, you should be comfortable programming in at least one language. Please contact me at ekmolloy [at] umd.edu if you are interested in this course but unsure if you should enroll.
CMSC829A covers on models and algorithms for fundamental problems in evolutionary genomics, including multiple sequence alignment, phylogeny estimation (both gene and species trees), and phylogenetic network and admixture graph estimation. Evolutionary relationships among molecular sequences or among species are not only of interest in and of themselves but are also routinely leveraged in applied research, spanning pandemic response, food safety, medicine, and even national security. The consequence is that methods for these problems are some of the most widely used -- and widely developed -- bioinformatics tools (see MAFFT v7, IQ-TREE v2, ASTRAL v3, and PhyloNet, for example). This course will not teach you how to run such tools. Instead, we focus on the underlying algorithmic ideas, which come from discrete optimization, combinatorics, graph theory, statistics, and machine learning. As such, CMSC829A should be appeal to students looking for an application-driven cs/stats/math course as well as students specifically interested in bioinformatics research.
In particular, despite significant methodological advances, the problems covered in this course are still unsolved, especially in light of recent applications (e.g., metagenomics, protein function and structure prediction, and cancer genomics) and ongoing large-scale sequencing iniatives, with the goal of assembling ultra-large genomic data sets. Methodological issues typically fall under computational challenges (e.g. NP-hard optimization problems and compute-intensive likelihood functions), statistical challenges (e.g. high dimensional spaces and model misspecification), and ``big data'' challenges (e.g. large, heterogeneous, and error-ridden data sets). This motivates the following questions: How do we translate a biological problem into a statistical/computational problem? What are our (model) assumptions and are they reasonable for our data? How do we rigorously evaluate methods from both a theoretical and practical perspective? And ultimately, how do we design new and improved computational methods? Students will have the opportunity to read/discuss recent scientific papers (see reading groups) and build upon state-of-the-art methods (see projects and highlights).
All course materials will be posted to ELMS and linked to this website. Coursework can be completed by referring to the lecture slides, so there is no required textbook. However, I will point interested students to the relevant textbook readings as well as freely available resources online.
For course to be MS/PhD qualifying, it "must primarily (at least 75%) base the course grade on a combination of homework, programming assignments, research projects, and exams. Any of these components are optional, except the course's written exam(s) which must account for at least 30% of the grade" (this information is from Tom Hurst). The final grade for CMSC829A will likely have the following breakdown:
- 30% exam
- 30% project
- 20% homework (one assignment)
- 20% reading groups (10% presentation and 10% participation)
For other course policies, refer to the syllabus.
Course evaluations are important, and the department and faculty take student feedback seriously. Near the end of the semester, students can go to http://www.courseevalum.umd.edu to complete their evaluations.
The image above illustrates evolution at a particular region of the genome (called a locus) for a species network (a) and species tree (b). Yunheng Han and I created these image by combining ...