PhD Proposal: Complexity Controlled Natural Language Generation

Talk
Sweta Agrawal
Time: 
11.02.2022 09:00 to 11:00
Location: 

IRB 5137

Natural Language Generation (NLG) is the process of using computational methods to produce natural human text. NLG technologies have been used to model the generation, summarization, and translation of natural language texts in many languages of the world. One of the challenges that make NLG hard is the existence of multiple valid ways of representing the same information in the text. However, not all variations are equally acceptable, and knowing the intended audience and the purpose of a text can help narrow down this space of acceptable variations. For example, generating text at the right level of complexity such that the target audience can comprehend and understand the information presented has the potential to make texts more accessible to a wide range of users including non-native speakers, language learners, and people who suffer from language or cognitive impairments.In this proposal, we introduce models that can enable us to control the complexity of the generated text so that it is tailored toward specific audiences. In the completed work, we first show that while standard neural machine translation (NMT) models can generate many valid translations of a given source text, they fail to capture the full spectrum of diverse translations that might reflect the users' language proficiency, vocabulary, and stylistic preferences. To address this, we introduce the task of Complexity Controlled Machine Translation in our second completed work, where the goal is to generate a machine-translated output of a source text in a given target language and at an appropriate text complexity level. We propose a multi-task model that learns to jointly translate and simplify a given source text towards a desired complexity level and construct datasets that can enable the generation and evaluation of such outputs. Finally, in our final piece of completed work, we provide the users with more fine-grained control over the complexity of the generated text within the same language by framing audience-specific text simplification as a text editing task and tailoring general-purpose Edit-based Non-Autoregressive Sequence-to-Sequence models that directly model editing operations like insertions or deletions. We show that our proposed models can generate better quality outputs and improve the accuracy of matching the desired target complexity than existing models while providing the users with the set of text transformations that led to a certain output.Having designed models that can enable the generation of such complexity-controlled outputs, our proposed work aims to evaluate the potential impact of the generated text on people’s ability to comprehend content. We propose to use reading comprehension-based assessments of the generated texts on topics like medical, scientific, or legal articles to test people's comprehension at different complexity levels.

Examining Committee

Chair:

Dr. Marine Carpuat

Department Representative:

Dr. Abhinav Shrivastava

Members:

Dr. Philip Resnik

Dr. Jordan Boyd Graber

Dr. Ani Nenkova (Adobe Research)