The Cubetree Data Warehouse Project

The Cubetree Data Warehouse Project

In this project we have developed the "cubetree" as a storage abstraction of the data cube and realized it using packed R-trees for most efficient cube range queries. We have reduced the problem of creation and maintenance of cube aggregates to sorting and bulk incremental merge-packing of cubetrees. This merge-pack has been implemented to use separate storage for writing the updated cubetrees, therefore allowing cube queries to continue even during maintenance (no query down time). We have also characterized the size of the delta increment for achieving good bulk update schedules for the cube.

Experiments with various data sets showed that any multi-dimensional range query on the cubetrees is very fast. For the 3GB data set of the demo, check below, the query response time on a cold start of a cubetree is anywhere between one thousandth to one third of a second per thousand aggregate points retrieved. On a hot tree, follow up queries, the time is anywhere from one half to one tenth of the cold start times.

But the best feature of the cubetree organization is its ability to do very efficient bulk incremental updates. The reduction of the cube update problem to sorting and merge-packing provides a scalable and industrial strength solution. We believe that this is the first one.

Demo of the cubetree storage organization