New Results in Statistical Reinforcement Learning
Recently, reinforcement learning (RL) has achieved inspiring success in game playing domains, including human-level control in Atari games and mastering the game of Go. Looking into the future, we expect to build machine learning systems that use RL to turn predictions into actions; applications include robotics, dialog systems, online education, adaptive medical treatment, to name but a few. In this talk, I show how theoretical insights from supervised learning can help understand RL, and better appreciate the unique challenges that arise from multi-stage decision making. The first part of the talk focuses on an interesting phenomenon, that a short planning horizon can produce better policies when there is limited data. I explain it by making a formal analogy to empirical risk minimization, and argue that a short planning horizon helps avoid overfitting. The second part of the talk concerns a core algorithmic challenge in state-of-the-art RL: sample-efficient exploration in large state spaces. I introduce a new complexity measure, the Bellman rank, which allows us to apply a unified algorithm to a number of important RL settings, in some cases obtaining polynomial sample complexity for the first time.