PhD Preliminary: Enhancing Modern Query Federation Systems: Execution Optimization, Performance Prediction, and Systems Assessment

Talk
Chujun Song
Time: 
01.18.2024 10:00 to 12:00
Location: 

IRB-5165 or via Zoom https://umd.zoom.us/my/cjsong

Query federation engines typically need to execute queries against data stored in systems with query processing capabilities, such as DBMSs. Given the same query, we could design a query federation engine to retrieve data directly from the DBMS and then execute the query within the query federation engine (a design we refer to as 'push data to code'). Alternatively, we could offload the majority of the query processing work to the DBMS, only fetching the results back (a design we refer to as 'push code to data'). However, these approaches represent extreme designs where one side handles most of the work while the other remains largely idle: the DBMS is idle in the 'push data to code' approach, and the query federation engine is idle in the 'push code to data' approach. Our work explores a middle ground, engaging both the query federation engine and the DBMS in query processing to maximize resource utilization. We focus specifically on join processing, as joins are typically the most resource-intensive part of query processing. To achieve this, we designed novel join algorithms that involve both sides in the query processing. These algorithms were implemented on the query federation engine Trino and evaluated against the 'push data to code' and 'push code to data' scenarios. We found that our new design indeed achieves lower latency. Additionally, we developed dynamic algorithms that can decide in runtime whether to execute the next part of the process on the query federation end or the DBMS end, adapting to changes in CPU pressure on both sides. Experiments showed that this dynamic approach outperforms static algorithms in terms of query latency.