PhD Defense: Efficient Environment Sensing and Learning for Mobile Robots
Data-driven learning is becoming an integral part of many robotic systems. Robots can be used as mobile sensors to learn about the environment in which they operate. Robots can also seek to learn essential skills, such as navigation, within the environment. A critical challenge in both types of learning is sample efficiency. Acquiring samples with physical robots can be prohibitively time-consuming. As a result, when applying learning techniques in robotics that require physical interaction with the environment, minimizing the number of such interactions becomes a key. The key question we seek to answer is: How do we make robots learn efficiently with a minimal amount of physical interaction? We approach this question along two fronts: extrinsic learning and intrinsic learning. In extrinsic learning, we want the robot to learn about the external environment in which it is operating. In intrinsic learning, our focus is on the robot to learn a skill using reinforcement learning (RL) such as navigating in an environment. In this dissertation, we develop algorithms that carefully plan where the robots obtain samples in order to efficiently perform intrinsic and extrinsic learning. In particular, we exploit the structural properties of Gaussian Process (GP) regression to design efficient sampling algorithms.We study two types of problems under extrinsic learning. We start with the problem of learning a spatially varying field modeled by a GP efficiently. Our goal is to ensure that the GP posterior variance, which is also the mean square error between the learned and actual fields, is below a predefined value. By exploiting the underlying properties of GP, we present a series of constant-factor approximation algorithms for minimizing the number of stationary sensors to place, minimizing the total time taken by a single robot, and minimizing the time taken by a team of robots to learn the field. Here, we assume that the GP hyperparameters are known. We then study a variant where our goal is to identify the hotspot in an environment. Here we do not assume that hyperparameters are unknown. For this problem, we present Upper Confidence Bound (UCB) and Monte Carlo Tree Search (MCTS) based algorithms for a single robot and later extend them to decentralized multi-robot teams. We also validate their performance empirically on real-world datasets.For intrinsic learning, our aim is to reduce the number of physical interactions by leveraging simulations often known as Multi-Fidelity Reinforcement Learning (MFRL). In the MFRL framework, an agent uses multiple simulators of the real environment to perform actions. We present two MFRL framework versions, model-based and model-free, that leverage GPs to learn the optimal policy in a real-world environment. By incorporating GPs in the MFRL framework, we empirically observe a significant reduction in the number of samples for model-based and model-free learning.
Dr. Pratap Tokekar Dr. Derek PaleyDr. Furong Huang Dr. Dinesh Manocha Dr. Jordan Boyd-Graber