PhD Defense: EXPERT-IN-THE-LOOP FOR SEQUENTIAL DECISIONS AND PREDICTIONS

Talk
Kianté Brantley
Time: 
12.09.2021 09:00 to 11:00
Location: 

IRB 4107

Sequential decisions and predictions are common problems in natural language processes, robotics, and video games. Essentially, an agent interacts with an environment to learn how to solve a particular problem. Research in sequential decisions and predictions has increased due in part to the success of reinforcement learning. However, this success has come at the cost of algorithms being very data inefficient, making learning in the real world difficult.Our primary goal is to make these algorithms more data-efficient using an expert-in-the-loop (e.g., imitation learning). Imitation learning is a technique for using an expert in sequential decision-making and prediction problems. Naive imitation learning has a covariate shift problem (i.e., training distribution differs from test distribution). We propose methods and ideas to address this issue and address other issues that arise in different styles of imitation learning. In particular, we study three broad areas of using an expert-in-the-loop for sequential decisions and predictions.First, we study the most popular category of imitation learning, interactive imitation learning. Although interactive imitation learning addresses issues around the covariate shift problem in naive imitation, it does this with a trade-off. Interactive imitation learning assumes access to an online interactive expert, which is unrealistic. Instead, we propose a setting where this assumption is realistic and attempt to reduce the number of queries needed for interactive imitation learning.We further introduce a new category on imitation learning algorithm called Reward-Learning Imitation learning. Unlike interactive imitation learning, these algorithms address the covariate shift using demonstration data instead of querying an online interactive expert. This category of imitation learning algorithms assumes access to an underlying reinforcement learning algorithm that can optimize a reward function learned from demonstration data. We benchmark all algorithms in this category and relate them to modern structured prediction NLP problems.Beyond reward-learning imitation learning and interactive imitation, some problems cannot be naturally expressed and solved using these two categories of algorithms. For example, an algorithm that solves a task while satisfying safety constraints. We introduce expert-in-the-loop techniques that extend beyond traditional imitation learning paradigms, where an expert provides demonstration features or constraints instead of state-action pairs.Examining Committee:

Chair:Dean's Representative:Members:

Dr. Hal Daumé III Dr. John BarasDr. Tom Goldstein Dr. Philip Resnik Dr. Geoff Gordon Dr. Kyunghyun Cho