|
|
 |
Ontology Management
|
| ^ top |
Though there has been tremendous interest in ontologies in the Semantic Web community, and tremendous interest
in querying and integrating heterogeneous data sources, the two have not come together. We started by developing methods to
improve the quality of answers to database queries (in particular improving recall) by using ontologies. We subsequently developed
sophisticated algorithms to automatically infer ontologies. Our specific contributions fall into four categories:
1. Ontology-based Querying: a. The notion of an ontology extended data source which consists of a data source with an associated
ontology. We have extended the relational model of data to support querying ontology extended relational databases, as well as ontology extended XML
databases. We have built a prototype system called HOME. This work was reported in a paper at the IEEE Intl. Conf. on Information Reuse and Integration
(IRI-2003). b. We have developed a framework called TOSS that answers queries to XML data sources by association ontologies with them and
using notions of similarity. We show how a notion of similarity can lead to the notion of a similarity extended ontology. We show how to develop
an XML algebra taking ontologies and similarities into account – update operations are also developed. We demonstrate experimentally that TOSS provides
higher quality answers than ordinary XML query engines. This work appears in the Proceedings of the 2004 ACM SIGMOD Conference. c. We have developed
the concept of a probabilistic ontology. A probabilistic ontology allows a given class to be decomposed in one or more ways. For a given
decomposition of a class, a conditional probability statement specifies the probability of an arbitrary member of a superclass belonging to a given
subclass. We show that one can associate, with any given object, a probability that that object should be in the answer to a given query. The object can
be returned to the user if that probability exceeds a given threshold. We developed
extensive algorithms to answer such queries efficiently and showed that when the threshold is around 75%, the answers to queries improve dramatically.
This work will be reported as a keynote lecture at the 2005 Intl. Conf. on Ontologies, Databases, and Semantics, Cyprus, Oct/Nov 2005. 2.
Ontology Inference a. We have developed, as part of our STORY system, a multi-lingual ontology extractor software module that enables us
to automatically extract RDF-ontologies from any collection of text documents in one of three languages – English, Spanish, and Italian. Our success was
honored by and reported in ComputerWorlds’ Sep. 12, 2005 edition as an honorable mention in thei
2005 Horizon Awards for innovative software. b. We have developed a probabilistic ontology extraction and refinement engine that can
extract initial probabilistic ontologies from a mix of relational data sources and from existing classification hierarchies. Our algorithms allow a user
to provide corrective feedback by using queries. We show that on certain document collections (such as the upper management Enron email archive which has
about
250K emails) that we can rapidly iterate towards an ontology that a user deems acceptable. 3. Ontology Integration a. We have developed a
sophisticated algorithm called CROW to integrate RDF ontologies together in the presence of a set of interoperation constraints that provide relationships
between terms in the two ontologies. We also developed the ICI algorithm to infer such interoperation constraints directly. Our experiments on a set of
ontologies from sources such as DAML and Ontobroker show that our algorithms are fast and accurate. b. We are developing the MOO algorithm
(Merging OWL Ontologies) for integrating multiple OWL ontologies together. This work is still ongoing. 4. RDF-Databases a. We have developed
the concept of an RDF-database and developed sophisticated view maintenance algorithms that take the graph structure of RDF data into account in order to
answer queries very efficiently. We have developed the notion of an aggregate operation for
RDF data. Our work is reported in the 2005 IEEE Intl. Conf. on Data Engineering.
UMD Participants: Yu Deng, Amy Sliva, Octavian Udrea, V.S. Subrahmanian Collaborators: Univ. of Napoli, Federico II, Italy – Piero
Bonatti, Pasquale Capasso, Massimiliano Albanese, Antonio Picariello. Univ. Simon Bolivar, Caracas, Venezuela. – Edna Ruckhaus, Maria Vidal.
Bar-Ilan University, Israel – Sarit Kraus. Hong Kong Polytechnic – Edward Hung. US Army Research Lab IBM/Almaden Hewlett Packard
|
RamoS: Reasoning about moving objectS
|
| ^ top |
Reasoning about Moving ObjectS (RAMOS)
We have been developing a theoretical foundation for reasoning about large collections of moving objects – such objects could range for airplanes to cars
to cell phones to birds. We are interested in tracking such moving objects, predicting where they are going and when they may get there, and in querying
and reasoning about such systems. Our accomplishments to date include the following:
1. We have defined the concept of a go-theory that allows us to reason about large collections of motion or route plans. In a go-theory, there are a
set of statements of the form “object O will go from location L, leaving in the time interval [S1,S2] and will go to the location L’, arriving in the time
interval [E1,E2], traveling at an average velocity between [V1,V2].” Go-theories mirror real life where we do not have 100% certainty about when we will
leave a given location, when we expect to reach a given destination, and how fast we expect to travel. We have developed a formal logic of go-theories and
developed very efficient algorithms to check consistency and to answer queries such as:
a. Find all objects that are guaranteed to be within a given region at time (or time interval) T.
b. Find all objects that are guaranteed to be within a given distance of another object at time T (or a time interval).
c. Find the nearest neighbor of object O at time (or time interval) T. Our initial work on go-theories was published in the 2004 Intl. Conf. on
Knowledge Representation and Reasoning (KR-04), Whistler, Canada, June 04.
* We subsequently defined a Motion Closed World Assumption (MCWA) that extends go-theories so that we can additionally assume that objects do not make
arbitrary movements not explicitly described in the go-theory. We develop algorithms to efficiently answer queries under the motion closed world
assumption. This work appears in IJCAI-2005.
2. We have also defined a notion of a “far” query that guarantees that within a given time window, two objects will always be sufficiently far apart.
This is important in applications such as air-traffic control where it is important to maintain separation constraints. We developed very efficient
algorithms to maintain far-queries and reported them in a paper in IJCAI-2005.
3. We are developing a suite of algorithms to deconflict motion plans. This could occur, for example, when two planes violate a separation constraint.
We are looking at methods to identify the causes of conflict and find algorithms to remove such conflicts, while still trying to achieve the goals of the
plans involved.
Applications:
Jointly with the US Navy, Lockheed Martin, BBN, and several other organizations, we have built an application that uses sensor readings to predict where
and when an enemy submarine will be in the future.
Jointly with the US Army and BAE Systems, we have developed a system to track enemy vehicles moving across a road network, predict where and when they
will be in the future, and how best to deploy coalition assets to neutralize the threat.
SPONSORS AND COLLABORATORS US Army BAE Systems NRL Lockheed Martin BBN
|
Reasoning about Cultures
|
| ^ top |
There are numerous applications where we wish to reason about how a political organization or a tribal group or an economic group might behave under
certain circumstances. We are developing the theory and algorithms required to understand the context within which such entities function and to develop
models of behavior for such entities.
We have developed a multi-lingual opinion analysis system called OASYS using which we can extract the intensity of opinion that people might have
on a given topic. We have developed a stochastic opponent modeling language called SOMA that can be used to develop behavioral models of third parties
in given situations. We have developed an ontology extractor that can automatically extract RDF and extended RDF ontologies from multiple
heterogeneous data sources including free text and relational data sources. We have applied this extractor to extract information about the Basques in
Spain, the Kikuyus in Kenya, and various tribes along the Pakistan/Afghanistan borderlands. We are developing algorithms to automatically extract
stochastic opponent models from a body of data (news events, etc). We are developing partial-information game-theoretic models of opponent behavior to
understand how a given set of actions might cause a response and to understand what actions to take in order to elicit a desired response with high
probability.
Faculty Involved: V.S. Subrahmanian Dana Nau Jon Wilkenfeld John Steinbruner
Students Involved: Massimiliano Albanese Gerardo Simari Amy Sliva
|
STORY
|
| ^ top |
AIM: Filter huge amounts of data to deliver short, succinct, personalized stories to multiple users using diverse devices.
GOALS: Extract stories about People, Places, Organizations, Events from multiple heterogeneous data sources: Text documents, Web sources,
Relational
databases, Object databases, Flat files, Proprietary formats; Automatically customize stories to fit user needs; Deliver stories across multiple access
devices: Wireless PDA, Laptops, Cell phones
Key applications:Stories about Greek characters (with Pompeii), Stories about Pakistani nuclear scientists (with US army),
Stories about tribes on Pakistan/ Afghan border (with US army)
RDF extraction RDF is a World Wide Web Consortium ontology standard. System can infer RDF extraction rules from examples and then apply these
rules to extract (Entity, Attribute, Value) triples from documents,
relational DBs, XML sources, etc. Time-stamped values and set-valued types are also permitted
STORY algorithms
Goal: have a story of size K (set by user) with high information content and good prose quality.
Defined as a multi-objective optimization problem - NP-hard to create an optimal story.
Story evaluation is a linear combination of: fact priority, story continuity and repetition.
Several algorithms:
OptSTORY - the optimal story
DynStory - dynamic programming
GenStory - genetic programming
Human subjects found that our system generates stories with highly valuable facts and that prose quality is acceptable.
|
OASYS: An Opinion Analysis System
|
| ^ top |
There are numerous applications where we wish to know the intensity of opinion on a given topic expressed in a collection of documents. Fro instance, a
company might wish to know what bloggers have to say about a given product. Alternatively, the US military might wish to know the intensity of opinion in
the Pakistani press about the Abu Ghraib scandal.
Our OASYS system can look at a collection of documents (in multiple languages) and assign an "intensity" of opinion of a given document on a given topic.
The intensity of opinion of document d with regard to topic t depends not only on the terms used in the document, but also on the perceptions of
the reader. As a consequence, we have developed statistical algorithms conditioned by human responses/input to create an intensity scoring model. Our
current (ongoing) prototype shows how the system behaves on English, Spanish and Italian documents.
|
GIDSTAR: Global Infectious Disease Surveillance Tracking and
Analysis Repository
|
| ^ top |
We are developing a software platform called GIDSTAR to gather information about diseases occurring around the world and track
them in real-time so as to provide alerts to relevant public health officials well before a wide-spread outbreak occurs. This requires:
Extracting information about outbreaks from a wide variety of text-rich sources such as DCO and PDF files Culling news reports from various
countries around the world for outbreak information Geo-referencing outbreak or symptom data and correlating this with land cover data,
poplulation density data, drainage data, temperature and weather data, road map data, and other demographic data Mining this data for possible
correlations between outbreak occurences and such phenomena - this can often serve as an effective preditor of future outbreaks. Providing
appropriate notificaions and suggested publi health actions to take when an outbreak occurs. Our initial focus has been on diarrheal diseases in
Kenya. We will shortly be looking at the avian flu. Faculty Participants: V.S. Subrahmanian Other Participants:
Louise-Kelly Hope (NIH) Diego Reforgiato Pasquale Capasso |
IMPACT: Interactive Maryland Platform for
Agents Collaborating
Together |
| ^ top |
The rapid proliferation of data on the Internet
and the ability to harness
both data and Internet capabilities has made agent technology very attractive
for a wide variety of applications. Past definitions of agents never
specified what it means for a piece of software to be considered an agent
and how to allow legacy data sources and software modules to be leveraged
by agents. The IMPACT project rectified these shortcomings. It is the first
effort to describe how to "agentize" legacy pieces of code in both
a formal
way, as well as via a practical application. The basic theory of IMPACT
is described by our book "Heterogeneous Agent Systems" (MIT Press,
2000).
In this book and related papers, we developed algorithms to agentize
legacy software and data sources, gave a formal semantics for such agents,
develop agents that can reason about other agents, developed agents that
can reasoning about time, and agents that can reason in uncertain domains.
We also developed a software platform for IMPACT supporting some of these
capabilities (but not all).
Current work focuses largely on how agents can scale up to large
scale
applications. Our approach to this is four fold. First, we develop ways
by which agents merge multiple tasks to minimize computation. Second, we
develop ways to group similar tasks together so that these clusters of similar
tasks can be merged effectively. Third, we have developed methods to distribute
agent workloads to other agents capable of performing tasks or subtasks.
Fourth, we are currently studying ways to clone agents and use a mix of
cloning and data caching to scale agent performance.
Collaborators:
Univ. of Maryland
Univ. of Manchester (UK)
Technische Universitat Wien (Austria)
Univ. di Napoli (Italy)
Univ. of Genoa (Italy)
Bar-Ilan University (Israel)
Applications:
Army Research Lab: Integrating logistical and tactical battlefield
knowledge
Army Research Lab: Agentizing the Combat Information Processor (CIP)
system
SAIC & US Army: Agent based implementation of the Army Flow Model
CoAX: Coalitions Agent Experiment jointly with many partners including
Lockheed Martin, BBN, and others
SenseIT: Tasking and monitoring battlefield sensor data
with many partners including BBN, BAE Systems, Fantastic Data, and
many others |
PASTA: Probabilistic
and Spatio Temporal Agents
|
| ^ top |
There are numerous applications where there is uncertainty
about where
and when certain events have occurred or will occur. For example, we
may be uncertain about when and where an enemy submarine will launch
and attack. Alternatively, in our everyday lives, there is uncertainty
about when and where there will be traffic jams. In an electricity
market, there is uncertainty about how much electricity will be required
by different utilities at different points in time and space, and what
the price of such electricity will be. We have already developed a formal
theoretical model of a heterogeneous temporal probabilistic (HTP) agent.
HTP agents support temporal probabilistic reasoning over heterogeneous
data and software sources - the first of their kind.
In current work, we are developing the concept of a spatio temporal
probabilistic agent, as well as a prototype implementation of the
HTP paradigm. Key research questions being addressed focus on
scalability.
Collaborators:
Univ. of Maryland
Univ. of Manchester
Bar-Ilan University
Applications:
CoAX: Coalitions Agent Experiment jointly with many partners including
Lockheed Martin, BBN, and others |
Multi-Agent
Security and Survivability
|
| ^ top |
As the sheer number of deployment multiagent applications
increases, there
is a growing need to ensure the security and survivability of both the
individual agents themselves, as well as the network of agents involved.
We are developing a suite of architectures and methods to protect individual
agents from being compromised, as well as methods to protect multiagent
systems from being rendered inoperable due to malicious attacks and/or
systems failures.
Our first contribution was a theoretical study of
how to protect agents from being compromised by external sources. Our
second contribution was how to ensure the survivability of a multiagent
system.
Current work focuses on different architectures to ensure distributed
survivability.
Collaborators:
Univ. of Maryland
Univ. of Manchester
Bar-Ilan University |
HOME: Heterogeneous
Ontology Management Engine
|
| ^ top |
Though there has been tremendous interest in ontologies
in the Semantic Web community, and tremendous interest in querying and integrating heterogeneous data sources, the two have not come
together. We have been developing methods to associate an ontology with a data source. Our methods include algorithms and tools to infer ontologies from
data sources. Our ontologies are sets of directed acyclic graphs (such a set, may, for example, contain a graph representing "isa"
relationships, another graph representing "partof" relationships, and yet another graph representing "affects" relationships). An
ontology extended data source consists of a data source with an associated ontology. We have extended the relational model of data to support querying
ontology extended relational databases, as well as ontology extended XML databases. We have built a prototype system called HOME.
Current efforts focus on (i) extending the syntax of ontologies
from
directed acycle graphs to broader classes, (ii) supporting querying over
ontology extended RDF sources, as well as (iii) supporting ontology extended
agent interactions.
Participants:
Univ. of Maryland
Hewlett Packard
US Army Researh Lab
US Naval Research Lab |
Multimedia Knowledge Management
|
| ^ top |
We have done extensive work over the years in the
creation, storage and querying of multimedia data. Principles of Multimedia
Database Systems and Multimedia Database Systems: Issues and Research Directions
are two of the books we have written.
We developed the first theory of multimedia database systems
in the early/mid
90s. Later, we developed the CHIMP system - one of the first systems to
automatically create multimedia presentations that showed different data
to different users based on context. More recently, we have built models
of databases for audio, video, multimedia presentations and Powerpoint data.
We are currently working on
scaling and summarizing audio video databases, and the
concept of multimedia stories in conjunction
with the archaeological department at Pompeii.
We are studying ways to manage massive amounts of multimedia knowledge
and to draw interesting inferences and aggregate activities
from video data.
Participants:
Univ. of Maryland
US Army Researh Lab
Univ. of Naples (Italy)
Univ. of Turin (Italy)
Archaeological Dept. Pompeii |
Probabilistic
databases
|
| ^ top |
Our lab is one of the pioneers in probabilistic
databases. In the mid-90s, we proposed the ProbView data model - this is the first probabilistic data model that got rid of hidden
independence assumptions and allowed the end user to ask queries taking into account, his knowledge of the dependencies between events. Later, we
developed extensions of this model to handle probabilities in temporal databases, as well as object bases containing probabilities. We also developed
probabilistic models of XML databases.
More recently, we have been studying the problem of efficient
computation
of probabilistic aggregates. We are also developing the concept of
spatio temporal probabilistic databases.
Participants:
Univ. of Maryland
Univ. of Rome
Technical Univ. of Vienna, Austria |
Probabilistic
Logics
|
| ^ top |
Our lab developed the now well known annotated logics
used extensively nowadays to reason about uncertainty of different types. We developed the first probabilistic logic programs and gave them
a syntax and semantics - many others have since implemented probabilistic LP systems. In addition, we were the first to develop temporal probabilistic
logic programs.
We are currently working on planning in uncertain domains, as
well as
probabilistic spatio temporal logics and agents. We are particularly
looking at the use of probabilistic reasoning in conjunction with image
and video analysis.
Participants
Univ. of Maryland |
Nonmonotonic Reasoning
|
| ^ top |
Though we are not working a lot in nonmonotonic
logics, our work is some of the best known in this field. Almost all algorithms to compute stable models of logic programs are heavily
influenced by our classical algorithm for this based on a mix of branch and bound and well founded model computation. We also developed the first methods
for nonground computation of stable models.
|
|