PhD Proposal: Interactive Question Answering with Natural Language Feedback

Talk
Ahmed Elgohary Ghoneim
Time: 
12.19.2019 14:00 to 16:00
Location: 
IRB 5105

Generalizing beyond the training examples is the primary goal of machine learning. In natural language processing, impressive models struggle to generalize when faced with test examples that differ from the training examples: e.g., in genre, domain, language or from a closely-related task. We study adaptation methods that incrementally improve an initial model by interactively collecting human feedback while the model is on-the-job. Unlike previous work that adopts simple forms of feedback (e.g., labeling predictions as correct/wrong or answering yes/no clarification questions), we focus on learning from free-form natural language (NL) feedback which can convey richer information and offer a more flexible interaction.We conduct this research in the context of question answering (QA). In particular, we focus on two instances of QA tasks: 1) Conversational question answering (CQA)---multiple interconnected questions are asked in an information-seeking dialog and models have to integrate the dialogue history to properly interpret each question. 2) QA over structured databases---generate a semantic parse that produces an answer to a given question when executed against the relevant database.First, we study model adaptation to a closely-related task in the context of CQA over raw text. Ample progress has been made on answering stand-alone questions using a given evidence document. In information-seeking dialogs, e.g., personal assistants, humans interact with a QA system by asking a sequence of related questions. CQA requires models to link questions together to resolve the conversational dependencies between them. Our preliminary work establishes simple models that show that incorporating conversational context indeed improves QA quality. Then, we reduce the CQA task to stand-alone QA by rewriting questions-in-context in stand-alone forms. We collect a large dataset of human rewrites and we use it to construct an evaluation framework for models that can collaborate with humans in the task of CQA. In the envisioned framework, we assess the ability of CQA models to decide when to ask for human assistance (e.g., question rewriting) and the extent to which models can incorporate the provided assistance to improve over time.Next, we consider the task of cross-domain semantic parsing over SQL databases. In that task, models are evaluated on databases that are never seen at training time. In the preliminary work, we design a framework that generates simplified explanation of predicted SQL parses to humans who in turn give NL feedback to the parser on what needs to be fixed. Using our framework, we crowdsource a dataset of mispredicted queries paired with NL feedback and the corrected predictions after properly incorporating the feedback. We introduce the task of SQL parse correction using NL feedback along with simple baselines.Having established these frameworks, we propose new models that can better incorporate human feedback in both CQA and text-to-SQL parsing. Further, we are planning to use our CQA framework to evaluate a range of uncertainty estimation methods for the subtask of deciding when to ask for human assistance.Examining Committee:

Chair: Dr. Jordan Boyd-Graber Dept rep: Dr. Rob Patro Members: Dr. Doug Oard