HCIL Seminar Series
The HCIL Seminar Series offers a common ground that can promote interdisciplinary discussion on a wide range of topics relating to Human-Computer Interaction.
Special thanks to the Dingman Center for Entrepreneurship for Sponsoring these events!
Fall 2009 Speakers
December 2, 2009
Wednesday, 11am, Room 2119, Hornbake Bldg, South Wing
Chris Callison-Burch
Johns Hopkins University
Fast, Cheap and Creative: Evaluating Translation Quality with Amazon’s Mechanical Turk
Abstract
I will describe a series of experiments that I ran on Amazon's Mechanical Turk, which is an online labor market that lets you pay people small amounts of money to do human intelligence tasks or HITs. Amazon makes it extremely convenient to pay people as little as $0.01 per task. A large number of people actively frequent Mechanical Turk, and complete these tasks with surprising accuracy, given the slim rewards. I analyze the quality of results that I got from Turkers on the following tasks: evaluating the quality of machine translation systems, human-in-the-loop translation edit rate, and producing translations for Spanish, French, German, Chinese and Urdu. I further show how to creatively use Mechanical Turk for doing a reading comprehension test for translation quality. I give a cost breakdown for my experiments, and speculate about its implications for the fields of speech and languages. My advice: stop doing unsupervised learning and start gathering data.
Biography
Chris Callison-Burch is an assistant research professor at the Center for Language and Speech Processing at Johns Hopkins University. His research groups recently released Joshua, an open source decoder for statistical translation models based on synchronous context free grammars (see http://cs.jhu.edu/~ccb/joshua/). He co-organizes WMT, an annual workshop on statistical machine translation that features shared tasks to evaluate the quality of machine translation, system combination techniques, and automatic evaluation metrics. He obsessively built a 10^9 word French-English parallel corpus last year by scraping just about ever bilingual site on the web.





