PhD Proposal: Stronger Inductive Biases for Sample-Efficient and Controllable Neural Machine Translation

Weijia Xu
12.13.2021 13:00 to 15:00

IRB 4105

As one of the oldest applications of natural language processing, machine translation (MT) has made a growing impact on human lives both as an end application or as a key component of cross-lingual information processing such as cross-lingual information retrieval and dialogue generation. Although neural machine translation (NMT) models have achieved impressive performance on some language pairs, they rely on large amounts of training data that consists of source sentences paired with reference translations to achieve reasonable translation quality. In addition, they are notorious for generating fluent outputs that do not faithfully reflect the meaning of the source sentence, and it is difficult to incorporate users' preferences in the output. The goal of this thesis is to address these issues by incorporating stronger inductive biases in the forms of training algorithms or model architectures that result in more sample-efficient and controllable NMT models.In our first line of research, we study how to integrate stronger inductive biases through training algorithms to make more effective use of the available data, including supervised data that consists of source sentences and their reference translations, and unsupervised data that consists of sentence in the source or target language. We start by introducing a new training objective to address the exposure bias problem — a common problem in sequence generation models that typically causes accumulated errors along the generated sequence at inference time, especially when the training data is limited. Next, we study how prior knowledge about the language distribution can be embedded in NMT models through semi-supervised learning objectives. We introduce a novel training objective with a theoretical guarantee on its global optimum and show that it can be effectively approximated and leads to improved performance in practice.In our second line of research, we study inductive biases in the form of NMT model architectures so that end users can control their outputs more easily by specifying their preferences at inference time. Controlling the outputs of standard NMT models is difficult with high computational cost at training or inference time. We develop an edit-based NMT model with novel edit operations that can incorporate users' lexical constraints with low computational cost at both training and inference time. In addition, we introduce a modular framework to help NMT models leverage terminology constraints in more flexible morphological forms.Extending our second line of research towards more controllable and faithful NMT, we propose to investigate internal model signals that may flag hallucination errors in MT — translation outputs that are (partially) detached from the source. We ask the question: can we introduce stronger inductive biases in MT models so that they can self-diagnose when hallucinations are generated? First, we propose to conduct an empirical study of internal model signals for categories of constructed inputs that will likely lead to hallucinations. Next, we plan to investigate to what degree the model signals uncovered can help detect hallucinations and can support mitigation strategies.Examining Committee:

Chair:Department Representative:Members:

Dr. Marine Carpuat Dr. Soheil Feizi Dr. Hal Daumé III Dr. Doug Oard Dr. He He