Real-world data --- especially when generated by distributed measurement infrastructures such as sensor networks ---
tends to be incomplete, imprecise, and erroneous, making it impossible to present it to users or feed it directly into
applications. The traditional approach to dealing with this problem is to first process the data using statistical or
probabilistic "models" that can provide more robust interpretations of the data. Current database systems, however, do
not provide adequate support for applying models to such data, especially when those models need to be frequently
updated as new data arrives in the system. Hence, most scientists and engineers, who depend on models for managing
their data, do not use database systems for archival or querying at all; at best, databases serve as a persistent raw
data store.
In this paper we define a new abstraction called "model-based views" and present the architecture of "MauveDB",
the system we are building to support such views. Just as traditional database views provide logical data independence,
model-based views provide independence from the details of the underlying data generating mechanism and hide the
irregularities of the data by using models to present a consistent view to the users. MauveDB supports a declarative
language for defining model-based views, allows declarative querying over such views using SQL, and supports several
different materialization strategies and techniques to efficiently maintain them in the face of frequent updates. We
have implemented a prototype system that currently supports views based on regression and interpolation, in the Apache
Derby open source DBMS, and we present results that show the utility and performance benefits that can be obtained by
supporting several different types of model-based views in a database system.