CMSC 858s: Computational Genomics

About this course

This course will explore computational techniques for processing and analyzing large-scale biological sequence coming from the genomics revolution. We will study string search and comparison algorithms, data structures for efficiently storing and querying strings, and techniques such as Hidden Markov Models for finding patterns within long strings. The emphasis will be on the underlying algorithms, rather than on the use of particular tools or databases (though examples of widely used tool will be presented).

Instructor: Carl Kingsford; carlk AT; Office hours: by appointment in CBCB 3113.
Stephen Altschul (NIH) will give a significant number of the lectures.

Class Time: MWF, 10:00-10:50.

Prerequisites: No previous biological knowledge is required. A certain amount of mathematical maturity is required (that any CS Ph.D. student will likely have) and basic familiarity with algorithm design techniques will be assumed. The class will generally be self-contained.

Grading: There will be several homework sets, a midterm, an in-class final, and a class programming project.

Credit: This course is a core Ph.D. or M.S. course, and it does count towards MS Comps. The area is Bioinformatics.


Lecture slides, Notes

Slides will be posted here as they are available.

Relevant Papers are here (accessible only from computers)

  1. Introduction, Syllabus
  2. Exact String Matching, Z-Algorithm
  3. Knuth-Morris-Pratt and Boyer-Moore
  4. Seminumerical string matching
  5. Sequence Alignment
  6. Gap Scores
  7. Local Alignment Statistics
  8. Substitution Matrices
  9. BLAST
  10. Linear Space Alignment
  11. MUMmer
  12. RNA Folding
  13. MSA Weights
  14. Column Scores
  15. The Dirichlet Mixture Model
  17. Multiple Sequence Alignment
  18. Local Multiple Sequence Alignment
  19. Gibbs sampling
  20. Suffix Trees
  21. Ukkonen's algorithm, board: 1, 2, 3, 4
  22. Suffix Arrays
  23. Burrows-Wheeler transform and FM-index
  24. HMMs and Gene Finding
  25. Profile Motifs
  26. Phylogenetics
  27. Genome Assembly
  28. Network Alignment