PhD Defense: Uncovering, Understanding, and Mitigating Social Biases in Language Models

Talk
Haozhe An
Time: 
05.08.2025 09:30 to 11:30
Location: 

IRB IRB-4109

This dissertation investigates how language models, including contemporary LLMs, can perpetuate social biases related to gender, race, and ethnicity as inferred from first names. Guided by the principle of counterfactual fairness, we use name substitution to uncover, understand, and mitigate these biases across three domains: stereotypes about personal attributes, occupational bias, and overgeneralized assumptions about romantic relationships.
By analyzing model behavior across diverse names, this dissertation reveals patterns of unfair treatment, such as personality judgments in social commonsense reasoning influenced by demographic associations, discrimination in hiring based on gender, race, and ethnicity, and heteronormative bias in relationship predictions. To address these issues, we propose open-ended diagnostic frameworks, interpretability analyses based on contextualized embeddings, and a novel consistency-guided finetuning method.
Together, these contributions aim to build fairer, more interpretable, and more inclusive language technologies.