CMSC828G Spring 26 (SysML)

Systems for Machine Learning (CMSC828G)

Group Project

Overview

In the second half of the course, you will work on a group project in groups of two or three. The idea is to use the systems concepts learned in the course to accelerate and/or parallelize an ML workload. You can select the project topic yourself but you will have to discuss it with the instructor once to ensure that the scope is appropriate.

Group Formation and Project Proposal

The first deadline is forming a group and submitting a project proposal. You should turn in a 1-2 page PDF in the MLSys 2025 style with the following sections.

Group members: Names of 2 or 3 group members.
Background and Motivation: Describe the systems or performance problem you want to solve.
Potential Approach(es): Describe your planned solution(s) to the problem. It's ok to only have rough ideas at this point.
Required Compute Estimate: An important part of systems for ML research is planning out your compute budget. Provide a rough estimate of the necessary compute for your project (CPU hours, GPU hours, etc.).

Interim Report

The interim report will update the project proposal with progress made so far. You should have preliminary profiling/performance results in your interim report.

Final Report

At the end of the semester, students will present their projects in class and submit a final report (that builds on the interim report). The presentations will be in the last two weeks and each should be 12 minutes with 3-5 minutes for questions. The final report should be written like a research paper (introduction, background, methodology, results, etc...) in the MLSys format.

Important Dates

Topic	Due on*
Group formation and project proposal	Mar 10
Interim report	Apr 14
Final presentation (in class)	April 30-May 7
Final report and code	May 11

*All deadlines are at 11:59 PM eastern time.

Project Ideas

An example project could be writing high performance Triton kernels for an existing ML workflow in your research. Another potential topic could be distributing a single GPU workload to multiple GPUs efficiently. Please reach out to the instructor if you need help coming up with project ideas.

Titles of some projects from previous semesters:

Explore Lossless Compression for Communication in LLM Training
Scaling Graph Neural Network Training Across Multiple GPUs
Developing Efficient WebGPU Compute Shaders to Accelerate ML Workloads
Efficient Implementation of LLMs on Handheld Edge Devices
Profiling and Optimizing KV Caching and Prefix Caching in LLM Inference
Efficient LLM Inference on Heterogeneous GPUs
Accelerating Distributed LLM Inference on FPGAs
Profiling and Optimizing CycleGAN