AI Alignment through Participation and Evaluation: Promises and Pitfalls
Researchers have been studying whether AI systems reflect stakeholders’ values and developing strategies for alignment when systems do not. In this talk, I will illustrate how two existing approaches to AI alignment, participation and evaluation (of and by relevant stakeholders) fall short of achieving purported goals. First, I will discuss how 1) participation is poorly understood and operationalized by AI researchers and practitioners, and 2) existing participatory mechanisms are insufficient to guarantee alignment. Next, I will show how red-teaming, one general evaluation approach proposed to analyze AI alignment, is an ill-defined process with highly variable inputs and outputs. Lastly, I will conclude by previewing my ongoing and future research agenda, including empirical study of the impact of red-teaming design choices (e.g., instructions for human versus automated evaluation approaches) on evaluation outcomes, aiming to develop more robust AI evaluation methods that empower stakeholders.