Better! Faster! Stronger (theorems)! Learning to Balance Accuracy and Efficiency when Predicting Linguistic Structures
Viewed abstractly, many classic problems in natural language processing can be cast as trying to map a complex input (eg., a sequence of words) to a complex output (eg., a syntax tree or semantic graph). This task is challenging both because language is ambiguous (learning difficulties) and represented with discrete combinatorial structures (computational difficulties). I will describe my multi-pronged research effort to develop learning algorithms that explicitly learn to trade-off accuracy and efficiency, applied to a variety of language processing phenomena. Moreover, I will show that in some cases, we can actually obtain a model that is faster and more accurate by exploiting smarter learning algorithms. And yes, those algorithms come with stronger theoretical guarantees too. The key insight that makes this possible is a connection between the task of predicting structured objects (what I care about) and imitation learning (a subfield in robotics). This insight came about as a result of my work a few years ago, and has formed the backbone of much of my work since then. These connections have led other NLP and robotics researchers to make their own independent advances using many of these ideas. At the end of the talk, I'll briefly survey some of my other contributions in the areas of domain adaptation and multilingual modeling, both of which also fall under the general rubric of "what goes wrong when I try to apply off-the-shelf machine learning models to real language processing problems?"