Divergences in Neural Machine Translation
Despite the explosion of online content worldwide, much information remains isolated by language barriers. While deep neural network models have dramatically improved the quality of machine translation (MT), truly breaking language barriers requires not only translating accurately, but also comparing what is said and how it is said across languages. In this talk, I will argue that modeling divergences from common assumptions about the data used to train MT systems can not only improve MT, but also help broaden the framing of MT to make it more responsive to user needs. I will first discuss recent work on automatically detecting cross-lingual semantic divergences, which occur when translation does not preserve meaning, and their impact on MT training. Next, I will introduce a training objective for neural sequence-to-sequence models that accounts for divergences between MT model hypotheses and reference human translation. Finally, I will argue that translation does not necessarily need to preserve all properties of the input and introduce a family of models that let us tailor translation style while preserving input meaning.