Description
This course covers
CMSC 498Y is for undergraduate computer science students who are interested in the intersection of machine learning and computational biology -- and who want to get hands-on experience working with models and data. The course material will cover classical machine learning algorithms for molecular sequence data (many of which are based on hidden markov models) as well as more recent developments (based on LLMs and CNNs), along with key background from computational biology. The course will be divided into modules focusing on specific prediction tasks. For each task, we will consider (1) the biology relevant to the prediction task, (2) the input data as well as how it is generated and curated for supervised training, and (3) the models used for prediction and how they are evaluated.
This year, there will be four modules:
- Basic models of biological sequences
- Protein family prediction
- RNA secondary structure prediction
- Protein secondary and tertiary structure prediction
CMSC 498Y counts towards the CS Undergraduate Major requirements under Area (2) Information Processing.