PhD Proposal: Fairness and Equity in Machine Learning for Healthcare

Daniel Smolyak
07.11.2024 11:00 to 12:30

The use of machine learning in healthcare settings has become increasingly common, from prediction of individual patient outcomes to supporting policy decision-making for public health officials. However, these machine learning models often replicate or exacerbate human biases and discrimination. I seek to address this problem both through identification of bias in existing healthcare modeling settings, and through the development of approaches to mitigate bias. In this proposal I focus on several complementary problems.I audit predictive models of COVID-19 cases, identifying whether models perform equally well across geographic regions with different demographic compositions when a) human mobility data is included as a model feature and b) when various approaches are used to correct case underreporting. I propose the development of a new model for correcting underreported case numbers by accounting for socioeconomic differences in testing, to use for downstream allocation of medical resources.I also investigate approaches to improve model performance specifically for small subgroups. I develop a regression model for joint estimation of multiple groups that uses sample weighting and separate sparsity penalties to boost model performance for smaller groups. Then I explore the potential of large language models to generate group-specific synthetic health data for group-wise data augmentation.Given historical inequities in allocation of health resources to marginalized communities and current disparities in a wide range of health outcomes, it is important that we both prevent machine learning systems from causing further harm through perpetuation of allocation inequities and leverage machine learning approaches to actively correct these harms.