PhD Proposal: Learning to Learn by Reinforcement: Applications and Theory

Amr Sharaf
05.01.2019 13:00 to 15:00
IRB 4105

Can we design agents that learn to learn using reinforcement learning? In this proposal we study several meta-learning problems. We present a family of algorithms based on reinforcement and imitation learning to facilitate these tasks, and propose to extend the current literature by studying two novel meta-learning problems: learning to learn from different kinds of feedback, and learning an optimizer for training uncertainty estimators.We show an example for how to use imitation when a simulator for the meta-learning task is available. We present MÊLÉE, a meta-learning algorithm for learning a good exploration policy in the interactive contextual bandit setting. MÊLÉE uses an imitation learning strategy to learn a good exploration policy that can then be applied to true contextual bandit tasks at test time.When a simulator is not available, direct learning from the reward signals provided by the environment is necessary. Such reward signals are usually extremely sparse. To tackle this, we introduce RESIDUAL LOSSPREDICTION (RESLOPE), a novel algorithm that solves this problem by automatically learning an internal representation of a denser reward function. RESLOPE operates as a reduction to contextual bandits, using its learned loss representation to solve the credit assignment problem, and a contextual bandit oracle to trade-off exploration and exploitation.Having established these techniques, we propose to study the meta-learning task of learning to learn from different kinds of feedback signals. First, we show the importance of combining different feedback signals in the bandit structured prediction task. We present BANDIT LEARNING TO SEARCH (BLS), an algorithm for structured prediction under online bandit feedback. We find that the fine-grained feedback is necessary for strong empirical performance, because it allows for a robust variance-reduction strategy. Second, we propose to extend this work by learning a control policy that dynamically selects between the available feedback signals to minimize labeling cost.Finally, we propose to study the problem of learning to learn for uncertainty estimation. Uncertainty estimation is the problem of quantifying the uncertainty of predictions from a machine learning model. We propose to meta-learn an optimizer that learns how to train calibrated probability models.

Examining Committee:

Chair: Dr. Hal Daumé III Dept rep: Dr. Tom Goldstein Members: Dr. Jordan Boyd-Graber Dr. Soheil Feizi Dr. Yisong Yue