Today's applications require the ability to access and query multiple heterogeneous data sources. Heterogeneous databases (HDB) and heterogeneous agent systems provide the necessary means to attain this goal. Commercial heterogeneous database products (e.g. DataJoiner[GL94, VZ98], OLE-DB[Bla96, BP98]) have already become commonplace, leading to significant growth in the number of deployed heterogeneous database (HDB) applications. A common characteristics of such HDB and agent systems is that they simultaneously process large numbers of queries/requests. The ability to efficiently handle large volumes of simultaneous queries is critical in many such applications.
In this work, we present various query optimization techniques to improve the performance of such heavily loaded HDB and agent systems. Since cost models play a key role in query optimization, we first develop a cost model for heterogeneous data sources. Although several methods have been proposed to estimate the cost of a query that accesses diverse data sources, our approach is the first to adapt a traditional System-R style optimizer to perform costing in a heterogeneous environment. In this work, we propose a framework through which a heterogeneous system can obtain the necessary cost and cardinality information. We describe the implementation of this framework in Garlic[T+96], and through experiments, we show that cost-based optimization is crucial in a heterogeneous environment.
As the next step to optimize the performance of HDB and agent systems, we propose a set of cost-based multiple query optimization (MQO) algorithms which exploit the commonalities among multiple queries submitted to such systems. Unlike classical databases, as different data sources vary widely in how they are implemented and often use diverse query languages, no uniform notion of a common subexpression exists. Hence, we first provide an application independent framework for specifying common subexpressions in a heterogeneous system. Next, we provide two cost-based MQO algorithms (and various accompanying heuristics), which take the set of queries and the common subexpressions as input, and produce a single query plan for the set of queries involved. Then, we experimentally show that our MQO algorithms achieve substantial savings. Although we focus on HDB servers in our experiments, the algorithms are also applicable to agent systems. To show this, we are planning to implement our MQO algorithms in IMPACT, which is a platform for interacting agents.
Thusfar, we assumed that a set of queries is provided as input to the MQO algorithms. However, an HDB or an agent system may have a vast number of pending queries. Hence, the final step in optimizing the performance of such systems is to group a large number of queries into classes of manageable size, so that each class can be optimized by the MQO techniques we are developing. We are planning to address this problem.
Back to the Summer 2000 dbchat index.