PhD Proposal: Incorporating Stylistic Properties with Machine Translation

Talk
Xing Niu
Time: 
05.01.2018 09:00 to 11:00
Location: 

AVW 3258

In this research, we target at controlling formality -- the prime stylistic dimension -- in Machine Translation (MT). Human translators translate a document for a specific audience and often ask what is the expected tone of the content when taking a new translation job. Yet this type of style information is not taken into account in modern machine translation. We explore different approaches to control the formality of MT outputs and minimize the impact on conveying original content.Controlling the style requires being able to model stylistic variations in text, such as annotating corpora used for training MT systems. We propose to model stylistic variations by inducing a stylistic subspace from the original word vector space. We hypothesize that differences between embeddings of words that share the same meaning are indicative of style differences. In order to test this hypothesis, we introduce a method based on Principal Component Analysis to identify salient dimensions of variations between word embeddings of lexical paraphrases.Given formality annotations derived from modeling lexical stylistic variations, we are able to control the formality of machine translation output. We define the task of solving this problem as Formality-Sensitive Machine Translation (FSMT). We implement the initial FSMT system based on a standard phrase-based MT architecture with n-best reranking. The reranking module promotes translation hypotheses whose formality levels are closer to user-provided formality level.Lexical formality models provide a useful but imperfect estimation of sentential formality, while neural methods offer more encouraging promise in modeling formality in the context of sentences. We implement the initial neural FSMT system by training both machine translation and style transfer simultaneously via multi-task learning. The integrated model obtains the ability to perform FSMT without being explicitly trained on style-annotated translation examples.Meaning preservation is a crucial objective for FSMT because generated text in the desired style is not useful if the intended meaning is not conveyed precisely. We propose to explicitly model meaning equivalence, regardless of style, for the task of style transfer and use this model to evaluate the translation adequacy of neural FSMT. Combining with other objectives such as matching desired formality and translation fluency, we also propose to optimize multiple objectives for neural FSMT.

Examining Committee:

Chair: Dr. Marine Carpuat Dept. rep: Dr. Furong Huang Members: Dr. Jordan Boyd-Graber Dr. Philipp Koehn