PhD Proposal: [Postponed] Aligning and Evaluating Artificial Intelligence with Human Expertise

Talk
Feng Gu
Time: 
02.04.2026 12:00 to 12:30
Location: 

Evaluation drives AI: models prioritize benchmarking scores and benchmarks orient system development. However, the current evaluation paradigm is flawed. These benchmarks, which function as static "tests," lack ecological and construct validity. They reward systems that excel at pattern matching rather than those that master the dynamic "game" of language in realistic environments. Consequently, while Large Language Models achieve superhuman scores, they fail to deliver corresponding utility in high-stakes, domain-specific tasks.
This proposal provides empirical evidence for this gap and argues for a paradigm shift.We first present a case study showing that state-of-the-art Large Language Models fail to match expert-level agreement in a specialized annotation task. More critically, we show that even when an AI agent achieves superhuman performance in a complex Language Game like Diplomacy, it fails to be helpful to human experts. This reveals a core misalignment in modern AI evaluation: that a model can win does not mean it is useful.
To address this, we propose a new evaluation framework centered on Language Games---grounded, interactive, and goal-oriented environments that demand robust and adaptive language use. However, as our Diplomacy findings show, simply using a new testbed is insufficient. We must also develop new methods to close the critical gap between performance and helpfulness.
This proposal outlines two such methods. First, to address the ``win vs. help'' gap, we propose a new training methodology in Diplomacy that aligns a model's policy with expert macro-intentions, rather than a generally optimal strategy. Second, we test the generalizability of this alignment principle by proposing a model for a traditional question answering task that learns to personalize its advice to individual user preferences and cognitive styles. By developing and evaluating AI in environments that reward dynamic, expert-aligned helpfulness, we aim to advance the development of truly human-centered AI.