PhD Defense: Learning With Minimal Supervision: New Meta-Learning and Reinforcement Learning Algorithms

Amr Sharaf
11.05.2020 13:30 to 15:30


Standard machine learning approaches thrive on learning from huge amounts of labeled training data, but what if we don’t have access to large amounts of labeled datasets? Humans have a remarkable ability to learn from only a few examples. To do so, they either build upon their prior learning experiences, or adapt to new circumstances by observing sparse learning signals. In this dissertation, we promote algorithms that learn with minimal amounts of supervision inspired by these two ideas. We discuss two families for minimally supervised learning algorithms based on meta-learning (or learning to learn) and reinforcement learning approaches.In the first part of the dissertation, we discuss meta-learning approaches for learning with minimal supervision. We present three meta-learning algorithms for few-shot adaptation of neural machine translation systems, promoting fairness in learned models by learning to actively learn under fairness parity constraints, and learning better exploration policies in the interactive contextual bandit setting. All of these algorithms simulate settings in which the learner has access to only a few labeled samples. Based on these simulations, the agent learns how to solve future learning tasks given only few labeled examples. As a result, these algorithms provide a method to promote the learning of fair and adaptive models given a minimal amount of supervision.In the second part of the dissertation, we study learning algorithms based on reinforcement and imitation learning. In many settings the learning agent doesn’t have access to fully supervised training data, however, it might be able to leverage access to a sparse reward signal, or an expert that can be queried to collect the labeled data. It is important then to be able to utilize these learning signals efficiently. Towards achieving this goal, we present three learning algorithms for learning from very sparse reward signals, leveraging access to noisy guidance, and solving structured prediction learning tasks under bandit feedback. In all cases, the result is a minimally supervised learning algorithm that can effectively learn given access to sparse reward signals.
Examining Committee:

Chair: Dr. Hal Daumé III Dean's rep: Dr. Philip Resnik Members: Dr. Jordan Boyd-Graber
Dr. Soheil Feizi Dr. Tom Goldstein Dr. Yisong Yue