In recent years, there has been an explosion of information in a variety
of environments that pose significantly different data management
challenges than traditional database domains. Examples include sensor
networks, world wide web, scientific domains, XML, P2P networks etc.
In this course, we will explore a few topics related to data management in
some such environments.
Distributed Measurement Networks:
Recent innovations in miniaturization technology have enabled
large-scale deployments of disitributed measurement networks in a variety of
settings. "Wireless" sensor-actuator networks, especially, enable
highly cost-effective monitoring and control of physical environments
at unprecedented detail. Networks of larger sensing devices such as
web cameras (eg. to monitor traffic), GPS devices, and RFID sensor networks have
also become ubiquitous. However, the potential of such networks has
barely been exploited, mainly because of the complexity of managing,
analyzing, and effectively using the huge amounts of data generated
in such distributed environments.
We will briefly review some such applications and the hardware trends in
sensor networks, and then discuss a variety of data management and processing
There is an increasing need for real-time processing, analyzing, and
dissemination, of data generated in environments such as sensor
networks, mobile devices, network monitors, financial data, XML data
etc. Traditional database systems cannot handle
requirements of such environments. As a result of this, there has
been much research in "data streams" in last few years. We will
review and discuss the needs of such applications, the proposed
systems for management of streaming data, and algorithms for
processing such data in a real-time fashion.
Topics of interest include query processing over data streams, query
Management of Uncertain, Imprecise Data / Probabilistic Databases:
Many of the challenges in data management in such environments stem
from the impreciseness, inherent uncertainty and incompleteness of
the generated data. This has brought the issue of effectively
managing such data to the forefront. There has been very little work
on this topic so far, thus opening up many exciting research
opportunities. Probabilistic and statistical modelling techniques in particular are
emerging as a promising alternative to manage complexity in many such
We will review some of the (older) work on probabilistic databases,
and also recent proposals to dealing with such data, especially those that
use machine learning techniques.