PhD Preliminary: High Performance Distributed Transactions for Multi-region Database Systems

Talk
Cuong Nguyen
Time: 
01.04.2024 14:30 to 16:00

The increased popularity of global applications has spurred the demand for a more advanced database system layer in multi-region cloud infrastructures. This layer should seamlessly abstract the distributed nature of the system behind a strong data consistency interface, enabling developers to focus on application logic without delving into complex data race issues. Simultaneously, it needs to exhibit resilience in the face of calamities, ranging from hardware failures to natural disasters that shut down entire data centers. However, these goals are usually at odds with the need for high performance. Guaranteeing a strong data consistency model necessitates additional coordination rounds between nodes to ensure the proper ordering of read and write operations. Enhancing availability and durability necessitates data replication across multiple locations. These measures, coupled with the potential for high-latency cross-region communication, can significantly degrade performance. Consequently, existing solutions typically either compromise on weaker consistency levels or incur performance penalties for achieving better guarantees.

In this dissertation, we propose a new deterministic concurrency control algorithm to address the performance limitation caused by cross-region network communication. While traditional database systems have to guard against non-deterministic failures by using expensive protocol like two-phase commit, exacerbated in a geographically distributed setting, deterministic concurrency control algorithms employ an upfront total ordering at a lower cost and eschew such a heavyweight protocol. Taking one step further, we devise a new deadlock resolution algorithm that reduces the number of cross-region network round-trips to an absolute minimum, significantly enhancing the efficiency and responsiveness of the system.The design of deterministic database systems often rests on certain assumptions, such as the read and write set of each transaction should be known in advance. Recent database systems, including non-deterministic ones, also leverage these assumptions for performance gains. Nevertheless, the validity of these assumptions has primarily been supported by anecdotal evidence or intuition. To bridge this gap and systematically assess the prevalence of these assumptions, we conducted a large-scale empirical study across a diverse corpus of open-source, real-world applications. This study not only provides quantitative evidence for the cases where these assumptions hold true, but also guides future efforts to address the cases where they do not.Finally, recognizing that deterministic concurrency control is still in its nascent stages for widespread adoption, we explore a general architectural design capable of equipping existing non-deterministic databases with strongly consistent multi-region transactions. We show that with minimal changes to a shared-storage database system, we can enable fast multi-region transactions while inheriting decades of expertly built and rigorously tested features from the original database system.