PhD Proposal: Transfer Learning in Natural Language Processing through Interactive Feedback

Talk
Michelle Yuan
Time: 
12.22.2020 12:00 to 14:00
Location: 

Remote

Machine learning models cannot easily adapt to new data domains and applications. For natural language processing (NLP), this is especially detrimental because language is perpetually changing. As people develop new ideas, written records reflect these innovations. Across the globe, there are thousands of distinct languages due to linguistic and cultural differences. Transfer learning transmits knowledge from source to target settings by modifying model architecture and optimization. This dissertation proposal takes a step further to include a “human in the loop”. If language is a byproduct of human thought, then human feedback should help transfer knowledge for NLP problems. Therefore, our goal is to improve model generalization under low-resource settings through interactive learning.First, we develop an active learning strategy to annotate examples for text classifiers that have trained on little to no data. State-of-the-art language models learn general text representations from predicting token occurrence over large corpora. Thus, our strategy uses language modeling loss to bootstrap classification uncertainty and sample representative points from surprisal clusters. Next, we refine cross-lingual word embeddings through user feedback for low-resource languages. Bilingual speakers transfer knowledge from English to the target language by aligning the cross-lingual embedding space. Finally, we create a multilingual, interactive topic modeling system for users to refine topics across languages. The user- constructed topic model bridges multilingual gaps in knowledge.In the proposed work, we plan to explore interactive learning for NLP problems that require a comprehensive understanding of human language. For tasks like coreference resolution and question answering, users can link entities to help the model automate information extraction. Therefore, we will design algorithms and interfaces for users to efficiently transfer knowledge by labeling text spans.Examining Committee:

Chair: Dr. Jordan Boyd-Graber Dept rep: Dr. John Dickerson Members: Dr. Rachel Rudinger