Crowdsourcing Translation without Bilingual People



We propose a rethinking of the translation problem to bring together translation technology and human-computer interaction, producing a framework for translation that exploits imperfect technology and limited human abilities in tandem to achieve capabilities neither can achieve alone. The core of this framework is MonoTrans, an iterative protocol in which monolingual human participants work together to improve imperfect machine translations.



MonoTrans: Collaborative Translation by Monolingual People

MonoTrans is an iterative protocol between monolingual people to translate. Monolingual translation, or translation by people who speak only the source or the target language, can be used to solve the problem of translating between rare languages, or to achieve quality translation at a large scale. At the core of monolingual translation are protocols in which the human participants (monolingual source or target language speakers) work together to make sense of machine translations. Since monolingual translation does not depend on bilingual humans, it can enable translation between uncommon language pairs where a bilingual translator is hard to find. In addition, monolingual translation can be supported by a larger population, and thus is likely to result in much higher throughput.

MonoTrans2:Asynchronous Protocol


Source Side UI

Target Side UI

Target Side: Identifying Translation Erros

MonoTrans2 is an improvement on MonoTrans with an asynchronous protocol. As reflected by the design of MonoTrans2, tasks in a translation process can be designed so every user can perform a task independent of the others. The tasks can be broken down and shortened so there is always a task for any user. By introducing these independent short tasks, the synchronicity restriction for monolingual translation can be removed, which in turn improves the scalability for monolingual translation.

ParaTrans: Error-Driven Paraphrase

The source text provided to a machine translation system is typically only one of many ways the input sentence could have been expressed, and alternative forms of expression can often produce a better translation. We introduce error driven paraphrasing of source sentences: instead of paraphrasing a source sentence exhaustively, we obtain paraphrases for only the parts that are predicted to be problematic for the translation system.



  Ben Bederson's Google Tech Talk, September 2009.
  Ben Bederson's talk at MSRA and Tsinghua University.


Public Citations

  • NPR Morning Edition, June 22, 2010, Using The Wisdom Of Crowds To Translate Language [link]
  • WAMU Kojo Show, Aug 24, 2010, Automating Language Translation: The Future of a Unified Internet? [link]
  • New Scientist, June 28, 2011, The man-machine: Harnessing humans in a hive mind [link]


We organized a Crowdsourcing and Translation workshop on June 2010.

Other Related Projects from HCIL

CrowdFlow by Alex Quinn:

  • Quinn, A., Bederson, B., Yeh, T., Lin, J. CrowdFlow: Integrating Machine Learning with Mechanical Turk for Speed-Cost-Quality Flexibility, HCIL Tech Report, May 2010 [link]

