Robust Machine Learning for Biomedical Data: Efficiency, Reliability, and Generalizability

Talk
Chenyu You
Time: 
04.30.2026 11:00 to 12:00

Machine learning for healthcare is often developed on curated datasets, but deployed in settings where labels are scarce, classes are imbalanced, and data distributions shift across hospitals, patient populations, and imaging modalities. This gap raises a central question: how can we build learning methods that are data-efficient, reliable, and robust to the heterogeneity of real clinical data? In this talk, I will present my work on this question. I will begin with statistically grounded methods for learning from imperfect medical data, focusing on biomedical image analysis with limited annotations and long-tailed class distributions. I will then show how to build learning frameworks with formal guarantees, including methods for provably accurate anatomical modeling that incorporate domain structure directly into the learning process. Finally, I will present recent work on foundation models for biomedical imaging and on scalable predictive systems for clinical prediction under distribution shift. Together, these projects aim to make biomedical machine learning systems robust in real clinical settings where labels are scarce, data are heterogeneous, and distributions shift. .