Description

This course covers selected statistical inference and machine learning algorithms for computational genomics. Why join the course? Machine learning is an exciting topic, with many major developments in recent years (even just the last year). As you might guess, the advances in NLP (e.g. LLMs) and computer vision (e.g. CNNs) are being leveraged to analyze molecular sequence data. However, if you go read a scientific paper, or even a blog post (check out this recent blog from Dr. Serafim Batzoglou), you will find that much of the text focuses on concepts from computational biology and uses domain specific terminology like alignment, phylogeny, coalescence, selection, coevolution, etc.

CMSC 498Y is for undergraduate computer science students who are interested in the intersection of machine learning and computational biology -- and who want to get hands-on experience working with models and data. The course material will cover classical machine learning algorithms for molecular sequence data (many of which are based on hidden markov models) as well as more recent developments (based on LLMs and CNNs), along with key background from computational biology. The course will be divided into modules focusing on specific prediction tasks. For each task, we will consider (1) the input data and how it is curated for training or testing, (2) the models used for prediction as well as how they are trained and evaluated, and (3) the relevant biology and how it is incorporated into the model or data curation (as applicable).

CMSC 498Y counts towards the CS Undergraduate Major requirements under Area (2) Information Processing.

Course Logistics
The syllabus is available on ELMS. The course schedule will be updated on this website, along with links to materials and assignments.
Web Accessibility
Please provide feedback on web accessibility to the instructor.