I'm interested in Reinforcement Learning, Informative Path Planning, Bayesian Optimization, and Multi-arm bandits.
Multi-Fidelity Reinforcement Learning with Gaussian Processes
We study the problem of Reinforcement Learning
(RL) using as few real-world samples as possible. A naive
application of RL can be inefficient in large and continuous state
spaces. We present two versions of Multi-Fidelity Reinforcement
Learning (MFRL), model-based and model-free, that leverage
Gaussian Processes (GPs) to learn the optimal policy in a realworld environment. In the MFRL framework, an agent uses
multiple simulators of the real environment to perform actions.
With increasing fidelity in a simulator chain, the number of
samples used in successively higher simulators can be reduced.
By incorporating GPs in the MFRL framework, we empirically
observe up to 40% reduction in the number of samples for modelbased RL and 60% reduction for the model-free version. We
examine the performance of our algorithms through simulations
and through real-world experiments for navigation with a ground
Learning a Spatial Field with Gaussian Process
Regression in Minimum Time
We study an informative path planning problem where the
goal is to minimize the time required to learn a spatial field using Gaussian Process (GP) regression. Specifically, given parameters 0 < ε, δ < 1,
our goal is to ensure that the predicted value at all points in an environment lies within ±ε of the true value with probability at least δ. We
study two versions of the problem. In the sensor placement version, the
objective is to minimize the number of sensors placed. In the mobile sensing version, the objective is to minimize the total travel time required
to visit the sensing locations. The total time is given by the time spent
obtaining measurements as well as time to travel between measurement
locations. By exploiting the smoothness properties of GP regression, we
present constant-factor approximation algorithms for both problems that
make accurate predictions at each point. Our algorithm is a deterministic, non-adaptive one and based on the Traveling Salesperson Problem.
In addition to theoretical results, we also compare the empirical performance using a real-world dataset with other baseline strategies.