XPaSS: A Multi-query Streaming XPath Query Engine

Introduction

XPaSS is an extended publisher-subscriber (P-S) system. In a P-S system, users subscribe their interests to a subscription server. The subscription is usually expressed in some kind of queries: simple ones such as key word lists and other more powerful ones such as the XPath query we are using. The subscription server organizes these queries (usually from a few thousand to hundreds of thousands or more), evaluate the queries over the data, and return the data of interests to every user. In many cases the data are in streaming form, such as stock market updates, real-time news feeds, weblog, weather broadcasts, and etc. Such data may arrive in a very high rate, and usually the server needs to provide very prompt (if not real-time) response. It is a challenge task to manage such large number of queries. the architecture

Usually we need to group the queries to improve the performance of the query engine in the subscription sever: to improve the throughput and to use less memory. Scalability is the most important issue in such systems. The simplest grouping, for the key word lists queries, is to use a reverse index where the key of the index is a key word and the value is a list of users that include that key word in their list. It is not easy to group more advanced queries like XPath queries. P-S systems using XPath subscription is a very hot topic recently. To get more details, please see the references.

Why XPaSS?

Unlike traditional P-S systems that always return the whole document to the subscriber, XPaSS can pinpoint the target data the subscriber wants ans return it in a timely fashion. Moreover, since not all streaming data on the web (stock market updates, real-time news feeds, weblog, weather broadcasts, and etc) can be segmented as single documents, XPaSS can essentially process queries without the limit that the data must be segmented into documents or other predefined units.

XPaSS also scales very well. In the best case, XPaSS can evaluate 500,000 XPath queries in less than 0.35s, using around 32MB memory. The reason XPaSS can scale to a very large number of queries is that it groups the queries using their common segments, which are a pair of two consecutive node tests and the axis that connects them. Most of the current approaches are grouping queries using prefixes or suffixes.

An Example of Segment-wise Grouping

For the following four queries:
  1. //store[location]//book/title
  2. //book[price]/title
  3. //store[//book[price][title]]/name
  4. //store[name][location]//book[price]/title

For a prefix-sharing system, only the "//store//book/title" prefix can be shared between the query 1 and 4. In contrast, XPaSS can share all the common segments in the queries. For example, the segment book/title are shared among all four queries.

People

Get XPaSS version 1.0

The code currently is available upon request. Please contact us.

Feedback

We welcome your comments and suggestions. We would be grateful if you could inform us of how you are using XPaSS. In particular, if you make some code modifications that you would like to share, we would be happy to incorporate them in the next version.

Acknowledgment

Our work on XPaSS is supported by National Science Foundation grants IIS-9984296 (CAREER) and IIS-0081860 (ITR). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Last modified: Fri May 30 21:40:20 EDT 2003
Unless otherwise noted, all material in the http://www.cs.umd.edu/projects/xsq/ hierarchy is Copyright © 2003 Feng Peng and Sudarshan S. Chawathe.

Validated as HTML 4.0 Transitional Check

Web Accessibility