ACT Inc. has developed a unique storage organization for on-line analytical processing (OLAP) on existing relational database systems (RDBMS). The Cubetree Storage Organization (CSO) logically and physically clusters data of materialized-views data, multi-dimensional indices on these data, and computed aggregate values all in one compact and tight storage structure that uses a fraction of the conventional table-based space. This is a breakthrough technology for storing and accessing multi-dimensional data in terms of storage reduction, query performance and incremental bulk update speed. Each Cubetree structure is not attached to a specific materialized view or an index but acts like a place holder which stores multiple and possibly unrelated views and multi-dimensional indices without having to analyze and identify the dense and sparse area of the underlying multi-dimensional data. In CSO data is clustered regardless of the skewedness underlying data. Given a set of views that need be supported, the CSO optimizer finds the best placement on one or more cubetrees for achieving maximum clustering. The underlying cost model is tailored to the CSO and uses standard greedy algorithms for selecting the optimal configuration. The DBA can tune the optimization to upper-bound either the duration of the incremental update window or the maximum storage used.
CSO offers a scalable and yet inexpensive alternative to existing pricey data warehousing and indexing tools. CSO is developed to be used as a stand-alone system, or as a co-resident of an RDBMS or tightly coupled with the extensible facilities of an Object-Relational DBMSs.
On Line Analytical Processing (OLAP) is critical for decision support in today's competitive and low margin profits era. The right information at the right time is a necessity for survival and data warehousing industry has an explosive market growth compared to a stagnant one for the database engines.
The data cube is a multi-dimensional aggregate operator and has been very powerful for modeling multiple projections of data of a data warehouse. It is, however, very expensive to compute. And to access the some or all of these projections efficiently, requires extensive multi-dimensional indexing which adds an equal or even higher cost to the data warehouse maintenance cost. Computing the data cube requires enormous hardware resources and takes a lot of time. Indexing can be done using either a multi-dimensional (MOLAP) approach or a Relational (ROLAP) modeling exemplified by star-schemas and join bitmap indexes. Both approaches speed up queries but suffer enormously in maintenance of the indexes which essentially takes the data warehouse out for an extensive period. The downtime window is critical in applications which need be globally available for the most of the day.
The CSO avoids the high cost of creation and maintenance by sorting and bulk incremental merge-packing of cubetrees. Updates to the data warehouse are sorted and merge-packed with the old cubetrees to generate an updated version of them. The sorting portion of it is the dominant factor in this incremental bulk update mode of operation while the merge-packing portion achieves rates of over 5GB per hour on a single disk I/O. No other storage organization offers this bulk merge-packing technology which is the ultimate scaleable solution. Experiments with various data sets showed that multi-dimensional range queries are very fast and . The performance is attributed to the packing of the storage and the unique sort order that permits sequential I/O for both reads during queries and reads and writes during maintenance. Furthermore, the compression technique of CSO achieves an order of magnitude less disk memory than conventional relational storage with indexing. In an extensive test using the TCP-D benchmark data the CSO implementation achieved at least a 2-1 storage reduction, a 10-1 better OLAP query performance, and a 100-1 faster updates over the conventional (relational) storage organization of materialized OLAP views indexed in the best possible way.A short paper on CSO