Varun Suryan

I am a fourth-year Ph.D. student in the Department of Computer Science at the University of Maryland College Park working with Pratap Tokekar.

I completed my masters in Computer Engineering at Virginia Tech and bachelors in Mechanical Engineering at Indian Institute of Technology Jodhpur . In the past, I have worked with Prof. Ankur Sinha of Indian Institute of Management Ahmedabad and Kalyanmoy Deb of Michigan State University.

Email  /  Google Scholar  /  Social

profile photo

I'm interested in Reinforcement Learning, Informative Path Planning, Bayesian Optimization, and Multi-arm bandits.

Multi-Fidelity Reinforcement Learning with Gaussian Processes
Varun Suryan, Nahush Gondhalekar, Pratap Tokekar
arXiv, 2020
arXiv / code

We study the problem of Reinforcement Learning (RL) using as few real-world samples as possible. A naive application of RL can be inefficient in large and continuous state spaces. We present two versions of Multi-Fidelity Reinforcement Learning (MFRL), model-based and model-free, that leverage Gaussian Processes (GPs) to learn the optimal policy in a realworld environment. In the MFRL framework, an agent uses multiple simulators of the real environment to perform actions. With increasing fidelity in a simulator chain, the number of samples used in successively higher simulators can be reduced. By incorporating GPs in the MFRL framework, we empirically observe up to 40% reduction in the number of samples for modelbased RL and 60% reduction for the model-free version. We examine the performance of our algorithms through simulations and through real-world experiments for navigation with a ground robot.

Learning a Spatial Field with Gaussian Process Regression in Minimum Time
Varun Suryan, Pratap Tokekar
WAFR, 2018

We study an informative path planning problem where the goal is to minimize the time required to learn a spatial field using Gaussian Process (GP) regression. Specifically, given parameters 0 < ε, δ < 1, our goal is to ensure that the predicted value at all points in an environment lies within ±ε of the true value with probability at least δ. We study two versions of the problem. In the sensor placement version, the objective is to minimize the number of sensors placed. In the mobile sensing version, the objective is to minimize the total travel time required to visit the sensing locations. The total time is given by the time spent obtaining measurements as well as time to travel between measurement locations. By exploiting the smoothness properties of GP regression, we present constant-factor approximation algorithms for both problems that make accurate predictions at each point. Our algorithm is a deterministic, non-adaptive one and based on the Traveling Salesperson Problem. In addition to theoretical results, we also compare the empirical performance using a real-world dataset with other baseline strategies.

Jon makes a great webpage.