Abstract

We consider a mediator architecture for querying multiple Web data sources in a wide area environment. These sources have limited query capability. We present a two-phase Web query optimizer that uses a capability-based pre-optimizer and an extended relational optimizer. The pre-optimizer generates a pre-plan for a mediator query. The pre-plan identifies Web access patterns relevant to the mediator query, as well as restrictions imposed by the capabilities of the WebSources. A relational optimizer utilizes the knowledge in the pre-plan in producing a good query execution plan. We will show that the choice of Web access patterns strongly impacts the cost of the query execution plan, and consider cost-based heuristics that the optimizer should use to make a good choice. We then present a novel optimization strategy to meet performance guarantees for queries in a noisy wide area environment. Using access cost distributions for WebSources, the optimizer determines a cost-delay utility for a query plan. The optimizer behavior can be more optimistic where it ignores the expected delay of accessing WebSources or it can be conservative and consider this delay.


Back to the Database Group Home Page index.