PhD Proposal: Detecting Fine-Grained Semantic Divergences to Improve Translation Understanding Across Languages

Eleftheria Briakou
05.20.2022 13:30 to 15:30

IRB 5105

One of the core goals of natural language processing (NLP) is to develop computational representations and methods to compare and contrast text meaning across languages. Such methods are essential to many NLP tasks, such as question answering and information retrieval. One of the limitations of those methods is the lack of sensitivity to detecting fine-grained semantic divergences, i.e., fine meaning differences in sentences that overlap in content. Yet, such differences abound even in parallel texts, i.e., texts in two different languages that are typically perceived as exact translations of each other. Detecting such fine-grained semantic divergences across languages matters for machine translation systems, as they yield challenging training samples, and for humans, who can benefit from a nuanced understanding of the source.In this proposal, we focus on detecting fine-grained semantic divergences in parallel texts to improve machine and human translation understanding. In the first piece of completed work, we start by providing empirical evidence that such small meaning differences exist and can be reliably annotated both at a sentence and at a sub-sentential level. Then, we show that they can be automatically detected by fine-tuning large pre-trained language models without supervision by learning to rank synthetic divergences of varying granularity. In our second piece of completed work, we turn to analyzing the impact of fine-grained divergences on Neural Machine Translation (NMT) training and show that they negatively impact several aspects of NMT outputs, e.g., translation quality and confidence. Based on these findings, we propose two orthogonal approaches to mitigating the negative impact of divergences and improve machine translation quality: first, we introduce a divergent-aware NMT framework that models divergences at training time; second, we propose generation-based approaches for revising divergences in mined parallel texts to make the corresponding references more equivalent in meaning.Having observed how subtle meaning differences in parallel texts impact downstream applications (i.e., NMT), in our first proposed work, we now ask how divergence detection can be used by humans directly. We propose to extend our current divergence detection methods to explaining the nature of divergences. Our approach will not only point to specific divergent segments within parallel texts, but also augment them with information external to the input (e.g., translated segment is more specific than the original) that indicates not only whether but also how two texts differ. The success of our approach will be quantified both automatically—via comparing the explanations with gold-standard annotations—and via a user study that tests whether explanations help humans understand translations better.
Examining Committee:

Chair:Department Representative:

Dr. Marine Carpuat Dr. Leo Zhicheng Liu Dr. Philip Resnik Dr. Hal Daumé III Dr. Luke Zettlemoyer (Univ of WA)