I am a full professor in the University of Maryland Computer Science Department (tenure home), Institute of Advanced Computer Studies, INFO, and Language Science Center.

My research focuses on making machine learning more useful, more interpretable, and able to learn and interact from humans. This helps users sift through decades of documents; discover when individuals lie, reframe, or change the topic in a conversation; or to compete against humans in games that are based in natural language.

Book a meeting with me (collaborators and UMD students).

Recent Publications

  • Zongxia Li, Wenhao Yu, Chengsong Huang, Rui Liu, Zhenwen Liang, Fuxiao Liu, Jingxi Che, Dian Yu, Jordan Boyd-Graber, Haitao Mi, and Dong Yu. Self-Rewarding Vision-Language Model via Reasoning Decomposition. International Conference on Learning Representations, 2026. [Bibtex]
  • Feng Gu, Zongxia Li, Carlos R. Colon, Benjamin Evans, Ishani Mondal, and Jordan Boyd-Graber. Large Language Models Are Effective Human Annotation Assistants, But Not Good Independent Annotators. Findings of the Association for Computational Linguistics, 2026. [Bibtex]
  • Tasnim Kabir, Dmytro Kurdydyk, Aadi Palnitkar, Liam Dorn, Ahmed Haj Ahmed, and Jordan Boyd-Graber. AUDITA: A New Dataset to Audit Humans or AI is Better at Audio QA. Findings of the Association for Computational Linguistics, 2026. [Bibtex]
  • Maharshi Gor, Yoo Yeon Sung, Yu Hou, Eve Fleisig, Zhu Irene Ying, Tianyi Zhou, and Jordan Boyd-Graber. AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering? Findings of the Association for Computational Linguistics, 2026. [Bibtex]
  • Ishani Mondal, Meera Bharadwaj, Ayush Roy, Aparna Garimella, and Jordan Boyd-Graber. SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity. European Association for Computational Linguistics, 2026. [Bibtex]
  • HyoJung Han, Nishant Balepur, Jordan Boyd-Graber, and Marine Carpuat. Measuring User's Mental Models of Speech Translation in Human-MT Collaboration. Association for Computational Linguistics, 2026. [Bibtex]
  • Nishant Balepur, Malachi Hamada, Varsha Kishore, Sergey Feldman, Amanpreet Singh, Pao Siangliulue, Joseph Chee Chang, Eunsol Choi, Jordan Boyd-Graber, and Aakanksha Naik. Language Models Don't Know What You Want: Evaluating Personalization in Deep Research Needs Real Users. Association for Computational Linguistics, 2026. [Bibtex]
    Accessible Abstract: Deep Research systems help scientists discover more relevant research papers, but existing tools have no understanding of their users. We design MyScholarQA, the first personalized deep research system that learns from a researcher's interests to suggest more relevant papers. We evaluate our system with a mix of offline evaluations, using LLMs that simulate users, and online interviews, ultimately showing that LLMs cannot replace the insights gained from speaking with real humans.
  • Nishant Balepur, Bhavya Rajasekaran, Hyunjin Jane Oh, Michael Xie, Atrey Desai, Vipul Gupta, Steven James Moore, Eunsol Choi, Rachel Rudinger, and Jordan Boyd-Graber. BenchMarker: An Education-Inspired Toolkit for Highlighting Flaws in Multiple-Choice Benchmarks. Association for Computational Linguistics, 2026. [Bibtex]
    Accessible Abstract: Multiple-choice questions are a standard way to evaluate NLP systems, but they are riddled with flaws that limit their validity. Extending our previous position paper, we draw on educational testing theory to design BenchMarker, a toolkit that detects faulty MCQs that exist on the Internet, have guessable shortcuts, and writing issues that confuse students and LLMs. We show how BenchMarker can detect and help fix flaws in NLP benchmarks.
  • Benjamin Börschinger, Jordan Boyd-Graber, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Michelle Chen Huebscher, Wojciech Gajewski, Yannic Kilcher, Rodrigo Nogueira, and Lierni Sestorain Saralegu. Meta Answering for Machine Reading. ArXiv, 2020. [Preprint] [Bibtex]
  • Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, and Jordan Boyd-Graber. Quizbowl: The Case for Incremental Question Answering. ArXiv, 2020. [Webpage] [Bibtex]
Jordan Boyd-Graber