Course LogisticsInstructor: Erin Molloy
Please log into ELMS (button to the left) to see course meeting time and location.
Target Audience and Prerequisites
CMSC82A is a computer science MS/PhD qualifying course in bioinformatics. (Search CMSC829A under CS grad coursework.) The target audience for this course is graduate students from the CS, ECE, AMSC, and statistics. No prior knowledge of biology is a required!
Familiarity with algorithms, probability, and basic statistics is required; in addition, you should be very comfortable programming in at least one language. The course assignment will be given in Python; however, you may complete the assignment in any language of your choosing. If you do not use Python, it is your responsibility to re-write any functions that were distributed with the assignment, like those for reading the input data. Please come to office hours if you are unsure whether you should enroll in the course.
Biology Graduate Students. If you are a graduate student in biology and the course material is relevant to your research, you are very welcome to take this class either for credit or for audit. If taking the course for credit, you should expect to dedicate more time to the class (e.g., doing the recommended readings, starting with Appendix B in Computational Phylogenetics). I would also like to meet with you during the first few weeks of the semester to hear about your goals for the course and to discuss my expectations regarding homework and exams. Typically, the programming assignment is replaced with a data analysis assignment. Additionally, some (but not all) of the problems on written homeworks and exams may be replaced with ones more relevant/suitable to your research/background. The students eligible for these modifications must be in the following biology graduate programs: BEES and Entomology (please contact me if you think your program should be added).
CMSC829A covers core computational problems for evolutionary genomics. Topics include methods for building multiple sequence alignments and reconstructing evolutionary histories, phylogenetic trees as well as phylogenetic networks, admixture graphs, and ancestral recombination graphs. Accurate and efficient inference of these graphical models is critical for resolving fundamental questions in biology. Moreover, they are routinely leveraged in applied research, spanning pandemic response, food safety, medicine, and even national security. The consequence is that the methods examined in this course are some of the most widely used bioinformatics tools (see MAFFT v7, IQ-TREE v2, ASTRAL v3, and PhyloNet, for example). This course will not teach you how to run such tools. Instead, we focus on the underlying algorithmic ideas, which come from discrete optimization, combinatorics, graph theory, statistics, and machine learning.
Lastly, despite significant methodological advances over the last few decades, the problems covered in this course are still unsolved, especially in light of applications (e.g., metagenomics, protein function and structure prediction, and cancer genomics) and emerging data types. This motivates us to consider the following questions:
- How do we translate a biological problem into a statistical and/or computational problem?
- What are our (model) assumptions and are they reasonable for our data?
- How do we rigorously evaluate methods from both a theoretical and practical perspective?
- And ultimately, where should we start when designing new and improved computational methods for biologists?
Overall, CMSC829A should be appeal to students looking for an application-driven cs/stats/math course as well as students specifically interested in bioinformatics research.
Use this website, most things will get linked here!
- Slides will be posted to ELMS. These are for your own use and should not be distributed.
- Graded assignments will be posted to ELMS/Gradescope.
- Graded assignments will be submitted on Gradescope.
General course communication will be through CampusWire.
- The instructor will post class-wide announcement to CampusWire. Some announcements may be additionally be posted to ELMS if they are very important and time sensitive.
- You are responsible for checking your email as well as ELMS and CampusWire with regular frequency.
- All personal course communication (e.g., about excused absences or grades) must be through ELMS. Please do NOT email the course staff.
All course materials will be posted to ELMS and linked to this website. Coursework can be completed by referring to the lecture slides, so there is no required textbook. However, there are recommended readings the textbook Computational Phylogenetics by my PhD advisor Tandy Warnow. Many students in the past have found these readings helpful for doing the homework. In any case, if you want to work in this field, I strongly encourage you to do the readings.
For course to be MS/PhD qualifying, it "must primarily (at least 75%) base the course grade on a combination of homework, programming assignments, research projects, and exams. Any of these components are optional, except the course's written exam(s) which must account for at least 30% of the grade" (this information is from Tom Hurst). The final grade for CMSC829A will likely have the following breakdown:
- 30% exam (in-class on Tues Nov 14th)
- 35% final project
- 25% written homeworks
- 10% programming assignment
For other course policies, refer to the syllabus.
Course evaluations are important, and the department and faculty take student feedback seriously. Near the end of the semester, students can go to http://www.courseevalum.umd.edu to complete their evaluations.
The image above illustrates evolution at a particular region of the genome (called a locus) for a species network (a) and species tree (b). Yunheng Han and I created these image using butterflies from this paper.