PhD Proposal: High-Throughput Network Distance Computations for Spatial

Talk
Shangfu Peng
Time: 
12.15.2016 10:00 to 11:30
Location: 

AVW 4424

In the past decades, shortest distance methods for road networks have been developed that focus on how to speed up the latency of a single source-target pair distance query. Large analytic applications on road networks including simulations (e.g., evacuation planning), logistics, location-based advertisement, and transportation planning require methods that provide high throughput (i.e., distance computations per second) and the ability to ``scale out'' by using large distributed computing clusters.
In the first part of this proposal, we present a new framework termed the All-Store Distance Oracle (ASDO) that efficiently computes the ASDO representation of any large road network in a distributed cluster, where the ASDO representation is a well-separated pair decomposition (WSPD) of a road network using network distance instead of Euclidean distance. The representation benefits from the small size of WSPD which enables the ASDO representation to retrieve $\epsilon$-approximate network distance in a high-throughput rate and can be easily embedded within any database system including RDBMS, Column-oriented DBMS, and key-value stores. The experimental results show that ASDO can compute the ASDO representation of the USA road network in a few hours using a modest size cluster. In comparison, previous database-centric approaches either do not scale to large road networks or are several orders of magnitude slower than the proposed ASDO for spatial queries.
In the second part of this proposal, we show how to use the ASDO representation in real applications. Two architectures, one is our ASDO architecture within PostgreSQL and the other one is a widely used hybrid architecture in industry, are evaluated on a variety of spatial analytic queries in common use such as KNN, distance matrix, and trajectory queries.

Embedding the ASDO representation inside PostgreSQL supports performing complex analytic queries on road networks using standard SQL. This makes the results of ASDO simple to use, yet considerably expressive, compared to traditional methods that require extensive development effort. Experimental results indicate that our ASDO architecture within PostgreSQL can compute more than 60K road distance operations per second on a large road network (e.g., USA), which achieves 20X more throughput compared to the state-of-the-art shortest distance computation methods.

In the third part of this proposal, a framework called SPDO is presented which implements an extremely fast distributed algorithm for computing spatial analytic queries on Apache Spark. The approach extends the ASDO representation which has now been adapted to use Spark's resilient distributed dataset (RDD).

SPDO improves the throughput by at least two orders of magnitude, which makes the approach suitable for applications that need to compute millions of network distances per second.

Examining Committee:

Chair: Dr. Hanan Samet

Dept rep: Dr. A. Udaya Shankar

Members: Dr. Larry Davis

Dr. David Mount