Cubetree: Organization of and Bulk Incremental Updates on the Data Cube

Nick Roussopoulos

Computer Science Department
University of Maryland
College Park

The complete paper is available in:

Abstract


The data cube is an aggregate operator which has been shown to be very powerful for On Line Analytical Processing (OLAP) in the context of data warehousing. It is, however, very expensive to compute, access, and maintain. In this paper we define the ``cubetree'' as a storage abstraction of the cube and realize it using packed R-trees for most efficient cube queries. We then reduce the problem of creation and maintenance of the cube to sorting and incremental bulk merge-packing of cubetrees. This merge-pack has been implemented to use separate storage for writing the updated cubetrees, therefore allowing cube queries to continue even during maintenance. Finally, we characterize the size of the delta increment for achieving good bulk update schedules for the cube. The paper includes experiments with various data sets measuring query and bulk update performance.


Last updated August 12, 1997