Jordan Boyd-Graber: Home

I am a full professor in the University of Maryland Computer Science Department (tenure home), Institute of Advanced Computer Studies, INFO, and Language Science Center.

My research focuses on making machine learning more useful, more interpretable, and able to learn and interact from humans. This helps users sift through decades of documents; discover when individuals lie, reframe, or change the topic in a conversation; or to compete against humans in games that are based in natural language.

My Google Scholar page

Book a meeting with me (collaborators and UMD students).

News

We're running a human-computer quizbowl tournament as part of an ICML 2026 workshop
I'm teaching Good AI Answers to Questions: How to Get Them and What Can Go Wrong
Quoted in the Atlantic
EACL 2026: human-like design editing with multi-agent LLMs
ICLR 2026: self-rewarded vision-language reasoning
ACL 2026: annotation assistants (Findings), delegation and trust in cooperative QA (Findings), multiple-choice benchmark flaws, personalization for deep research, audio QA auditing (Findings), and mental models of speech translation

Infant leaning forward on table in front of Jordan Boyd-Graber

Recent Publications

Zongxia Li, Wenhao Yu, Chengsong Huang, Rui Liu, Zhenwen Liang, Fuxiao Liu, Jingxi Che, Dian Yu, Jordan Boyd-Graber, Haitao Mi, and Dong Yu. Self-Rewarding Vision-Language Model via Reasoning Decomposition. International Conference on Learning Representations, 2026. [Bibtex]

@inproceedings{Li:Yu:Huang:Liu:Liang:Liu:Che:Yu:Boyd-Graber:Mi:Yu-2026,
	Title = {Self-Rewarding Vision-Language Model via Reasoning Decomposition},
	Author = {Zongxia Li and Wenhao Yu and Chengsong Huang and Rui Liu and Zhenwen Liang and Fuxiao Liu and Jingxi Che and Dian Yu and Jordan Boyd-Graber and Haitao Mi and Dong Yu},
	Booktitle = {International Conference on Learning Representations},
	Year = {2026},
}

Feng Gu, Zongxia Li, Carlos R. Colon, Benjamin Evans, Ishani Mondal, and Jordan Boyd-Graber. Large Language Models Are Effective Human Annotation Assistants, But Not Good Independent Annotators. Findings of the Association for Computational Linguistics, 2026. [Bibtex]

@article{Gu:Li:Colon:Evans:Mondal:Boyd-Graber-2026,
	Title = {Large Language Models Are Effective Human Annotation Assistants, But Not Good Independent Annotators},
	Author = {Feng Gu and Zongxia Li and Carlos R. Colon and Benjamin Evans and Ishani Mondal and Jordan Lee Boyd-Graber},
	Journal = {Findings of the Association for Computational Linguistics},
	Year = {2026},
}

Tasnim Kabir, Dmytro Kurdydyk, Aadi Palnitkar, Liam Dorn, Ahmed Haj Ahmed, and Jordan Boyd-Graber. AUDITA: A New Dataset to Audit Humans or AI is Better at Audio QA. Findings of the Association for Computational Linguistics, 2026. [Bibtex]

@article{Kabir:Kurdydyk:Palnitkar:Dorn:Ahmed:Boyd-Graber-2026,
	Title = {AUDITA: A New Dataset to Audit Humans or AI is Better at Audio QA},
	Author = {Tasnim Kabir and Dmytro Kurdydyk and Aadi Palnitkar and Liam Dorn and Ahmed Haj Ahmed and Jordan Lee Boyd-Graber},
	Journal = {Findings of the Association for Computational Linguistics},
	Year = {2026},
}

Maharshi Gor, Yoo Yeon Sung, Yu Hou, Eve Fleisig, Zhu Irene Ying, Tianyi Zhou, and Jordan Boyd-Graber. AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering? Findings of the Association for Computational Linguistics, 2026. [Bibtex]

@article{Gor:Sung:Hou:Fleisig:Ying:Zhou:Boyd-Graber-2026,
	Title = {AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering?},
	Author = {Maharshi Gor and Yoo Yeon Sung and Yu Hou and Eve Fleisig and Zhu Irene Ying and Tianyi Zhou and Jordan Lee Boyd-Graber},
	Journal = {Findings of the Association for Computational Linguistics},
	Year = {2026},
}

Ishani Mondal, Meera Bharadwaj, Ayush Roy, Aparna Garimella, and Jordan Boyd-Graber. SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity. European Association for Computational Linguistics, 2026. [Bibtex]

@inproceedings{Mondal:Bharadwaj:Roy:Garimella:Boyd-Graber-2026,
	Title = {SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity},
	Author = {Ishani Mondal and Meera Bharadwaj and Ayush Roy and Aparna Garimella and Jordan Boyd-Graber},
	Booktitle = {European Association for Computational Linguistics},
	Year = {2026},
}

HyoJung Han, Nishant Balepur, Jordan Boyd-Graber, and Marine Carpuat. Measuring User's Mental Models of Speech Translation in Human-MT Collaboration. Association for Computational Linguistics, 2026. [Bibtex]

@inproceedings{Han:Balepur:Boyd-Graber:Carpuat-2026,
	Title = {Measuring User's Mental Models of Speech Translation in Human-MT Collaboration},
	Author = {HyoJung Han and Nishant Balepur and Jordan Lee Boyd-Graber and Marine Carpuat},
	Booktitle = {Association for Computational Linguistics},
	Year = {2026},
}

Nishant Balepur, Malachi Hamada, Varsha Kishore, Sergey Feldman, Amanpreet Singh, Pao Siangliulue, Joseph Chee Chang, Eunsol Choi, Jordan Boyd-Graber, and Aakanksha Naik. Language Models Don't Know What You Want: Evaluating Personalization in Deep Research Needs Real Users. Association for Computational Linguistics, 2026. [Bibtex]
```
@inproceedings{Balepur:Hamada:Kishore:Feldman:Singh:Siangliulue:Chang:Choi:Boyd-Graber:Naik-2026,
	Title = {Language Models Don't Know What You Want: Evaluating Personalization in Deep Research Needs Real Users},
	Author = {Nishant Balepur and Malachi Hamada and Varsha Kishore and Sergey Feldman and Amanpreet Singh and Pao Siangliulue and Joseph Chee Chang and Eunsol Choi and Jordan Lee Boyd-Graber and Aakanksha Naik},
	Booktitle = {Association for Computational Linguistics},
	Year = {2026},
	Url = {http://cs.umd.edu/~jbg//docs/2026_acl_dr_personalization.pdf},
}
```
Accessible Abstract: Deep Research systems help scientists discover more relevant research papers, but existing tools have no understanding of their users. We design MyScholarQA, the first personalized deep research system that learns from a researcher's interests to suggest more relevant papers. We evaluate our system with a mix of offline evaluations, using LLMs that simulate users, and online interviews, ultimately showing that LLMs cannot replace the insights gained from speaking with real humans.
Nishant Balepur, Bhavya Rajasekaran, Hyunjin Jane Oh, Michael Xie, Atrey Desai, Vipul Gupta, Steven James Moore, Eunsol Choi, Rachel Rudinger, and Jordan Boyd-Graber. BenchMarker: An Education-Inspired Toolkit for Highlighting Flaws in Multiple-Choice Benchmarks. Association for Computational Linguistics, 2026. [Bibtex]
```
@inproceedings{Balepur:Rajasekaran:Oh:Xie:Desai:Gupta:Moore:Choi:Rudinger:Boyd-Graber-2026,
	Title = {BenchMarker: An Education-Inspired Toolkit for Highlighting Flaws in Multiple-Choice Benchmarks},
	Author = {Nishant Balepur and Bhavya Rajasekaran and Hyunjin Jane Oh and Michael Xie and Atrey Desai and Vipul Gupta and Steven James Moore and Eunsol Choi and Rachel Rudinger and Jordan Lee Boyd-Graber},
	Booktitle = {Association for Computational Linguistics},
	Year = {2026},
	Url = {http://cs.umd.edu/~jbg//docs/2026_acl_benchmarker.pdf},
}
```
Accessible Abstract: Multiple-choice questions are a standard way to evaluate NLP systems, but they are riddled with flaws that limit their validity. Extending our previous position paper, we draw on educational testing theory to design BenchMarker, a toolkit that detects faulty MCQs that exist on the Internet, have guessable shortcuts, and writing issues that confuse students and LLMs. We show how BenchMarker can detect and help fix flaws in NLP benchmarks.

Benjamin Börschinger, Jordan Boyd-Graber, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Michelle Chen Huebscher, Wojciech Gajewski, Yannic Kilcher, Rodrigo Nogueira, and Lierni Sestorain Saralegu. Meta Answering for Machine Reading. ArXiv, 2020. [Preprint] [Bibtex]

@article{B\"orschinger:Boyd-Graber:Buck:Bulian:Ciaramita:Huebscher:Gajewski:Kilcher:Nogueira:Saralegu-2020,
	Title = {Meta Answering for Machine Reading},
	Author = {Benjamin B\"orschinger and Jordan Boyd-Graber and Christian Buck and Jannis Bulian and Massimiliano Ciaramita and Michelle Chen Huebscher and Wojciech Gajewski and Yannic Kilcher and Rodrigo Nogueira and Lierni Sestorain Saralegu},
	Journal = {ArXiv},
	Year = {2020},
	Url = {https://arxiv.org/abs/1911.04156},
}

Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, and Jordan Boyd-Graber. Quizbowl: The Case for Incremental Question Answering. ArXiv, 2020. [Webpage] [Bibtex]

@article{Rodriguez:Feng:Iyyer:He:Boyd-Graber-2020,
	Title = {Quizbowl: The Case for Incremental Question Answering},
	Author = {Pedro Rodriguez and Shi Feng and Mohit Iyyer and He He and Jordan Boyd-Graber},
	Journal = {ArXiv},
	Year = {2020},
	Url = {https://arxiv.org/abs/1904.04792},
}