Machine Learning Powered Query Optimization

Talk
Ryan Marcus
Talk Series: 
Time: 
02.17.2022 11:00 to 12:00

Database management systems (DBMSes) depend on query optimizers to transform a user's declarative query into an efficient execution plan. Query optimizers are critical because a bad query plan can be orders of magnitude slower than the optimal plan. Modern query optimizers are complex and expensive to maintain, as they integrate a wide range of hand-tuned heuristics and manually-engineered cost models which must be updated for every new capability added to the DBMS. I will present two recent approaches to query optimization that leverage deep reinforcement learning to simultaneously improve query performance and decrease maintenance burden. The first approach, Neo (VLDB 19), combines tree convolution neural networks with a novel value iteration technique to fully replace a traditional query optimizer, yielding as much as 2x improvements after just 36 hours of training on stable workloads. The second approach, Bao (SIGMOD 21), targets dynamic workloads, and learns to "steer" an existing query optimizer by training an agent via a contextual multi-armed bandit framework. More broadly, both Neo and Bao highlight the huge potential impact of applying machine learning to systems problems, giving us a glimpse of what a fully learned system could do, as well as highlighting several potential pitfalls along the way.