Research
Teaching
Publications
Students
Contact Me
Research Projects

Ontology Management

^ top

Though there has been tremendous interest in ontologies in the Semantic Web community, and tremendous interest in querying and integrating heterogeneous data sources, the two have not come together. We started by developing methods to improve the quality of answers to database queries (in particular improving recall) by using ontologies. We subsequently developed sophisticated algorithms to automatically infer ontologies. Our specific contributions fall into four categories:

1. Ontology-based Querying:

a. The notion of an ontology extended data source which consists of a data source with an associated ontology. We have extended the relational model of data to support querying ontology extended relational databases, as well as ontology extended XML databases. We have built a prototype system called HOME. This work was reported in a paper at the IEEE Intl. Conf. on Information Reuse and Integration (IRI-2003).

b. We have developed a framework called TOSS that answers queries to XML data sources by association ontologies with them and using notions of similarity. We show how a notion of similarity can lead to the notion of a similarity extended ontology. We show how to develop an XML algebra taking ontologies and similarities into account – update operations are also developed. We demonstrate experimentally that TOSS provides higher quality answers than ordinary XML query engines. This work appears in the Proceedings of the 2004 ACM SIGMOD Conference.

c. We have developed the concept of a probabilistic ontology. A probabilistic ontology allows a given class to be decomposed in one or more ways. For a given decomposition of a class, a conditional probability statement specifies the probability of an arbitrary member of a superclass belonging to a given subclass. We show that one can associate, with any given object, a probability that that object should be in the answer to a given query. The object can be returned to the user if that probability exceeds a given threshold. We developed extensive algorithms to answer such queries efficiently and showed that when the threshold is around 75%, the answers to queries improve dramatically. This work will be reported as a keynote lecture at the 2005 Intl. Conf. on Ontologies, Databases, and Semantics, Cyprus, Oct/Nov 2005.

2. Ontology Inference

a. We have developed, as part of our STORY system, a multi-lingual ontology extractor software module that enables us to automatically extract RDF-ontologies from any collection of text documents in one of three languages – English, Spanish, and Italian. Our success was honored by and reported in ComputerWorlds’ Sep. 12, 2005 edition as an honorable mention in thei 2005 Horizon Awards for innovative software.

b. We have developed a probabilistic ontology extraction and refinement engine that can extract initial probabilistic ontologies from a mix of relational data sources and from existing classification hierarchies. Our algorithms allow a user to provide corrective feedback by using queries. We show that on certain document collections (such as the upper management Enron email archive which has about 250K emails) that we can rapidly iterate towards an ontology that a user deems acceptable.

3. Ontology Integration

a. We have developed a sophisticated algorithm called CROW to integrate RDF ontologies together in the presence of a set of interoperation constraints that provide relationships between terms in the two ontologies. We also developed the ICI algorithm to infer such interoperation constraints directly. Our experiments on a set of ontologies from sources such as DAML and Ontobroker show that our algorithms are fast and accurate.

b. We are developing the MOO algorithm (Merging OWL Ontologies) for integrating multiple OWL ontologies together. This work is still ongoing.

4. RDF-Databases

a. We have developed the concept of an RDF-database and developed sophisticated view maintenance algorithms that take the graph structure of RDF data into account in order to answer queries very efficiently. We have developed the notion of an aggregate operation for RDF data. Our work is reported in the 2005 IEEE Intl. Conf. on Data Engineering.

UMD Participants: Yu Deng, Amy Sliva, Octavian Udrea, V.S. Subrahmanian

Collaborators:

Univ. of Napoli, Federico II, Italy – Piero Bonatti, Pasquale Capasso, Massimiliano Albanese, Antonio Picariello.

Univ. Simon Bolivar, Caracas, Venezuela. – Edna Ruckhaus, Maria Vidal.

Bar-Ilan University, Israel – Sarit Kraus.

Hong Kong Polytechnic – Edward Hung.

US Army Research Lab

IBM/Almaden

Hewlett Packard

 

RamoS: Reasoning about moving objectS

^ top

Reasoning about Moving ObjectS (RAMOS) We have been developing a theoretical foundation for reasoning about large collections of moving objects – such objects could range for airplanes to cars to cell phones to birds. We are interested in tracking such moving objects, predicting where they are going and when they may get there, and in querying and reasoning about such systems. Our accomplishments to date include the following:

1. We have defined the concept of a go-theory that allows us to reason about large collections of motion or route plans. In a go-theory, there are a set of statements of the form “object O will go from location L, leaving in the time interval [S1,S2] and will go to the location L’, arriving in the time interval [E1,E2], traveling at an average velocity between [V1,V2].” Go-theories mirror real life where we do not have 100% certainty about when we will leave a given location, when we expect to reach a given destination, and how fast we expect to travel. We have developed a formal logic of go-theories and developed very efficient algorithms to check consistency and to answer queries such as:

a. Find all objects that are guaranteed to be within a given region at time (or time interval) T.

b. Find all objects that are guaranteed to be within a given distance of another object at time T (or a time interval).

c. Find the nearest neighbor of object O at time (or time interval) T. Our initial work on go-theories was published in the 2004 Intl. Conf. on Knowledge Representation and Reasoning (KR-04), Whistler, Canada, June 04.

* We subsequently defined a Motion Closed World Assumption (MCWA) that extends go-theories so that we can additionally assume that objects do not make arbitrary movements not explicitly described in the go-theory. We develop algorithms to efficiently answer queries under the motion closed world assumption. This work appears in IJCAI-2005.

2. We have also defined a notion of a “far” query that guarantees that within a given time window, two objects will always be sufficiently far apart. This is important in applications such as air-traffic control where it is important to maintain separation constraints. We developed very efficient algorithms to maintain far-queries and reported them in a paper in IJCAI-2005.

3. We are developing a suite of algorithms to deconflict motion plans. This could occur, for example, when two planes violate a separation constraint. We are looking at methods to identify the causes of conflict and find algorithms to remove such conflicts, while still trying to achieve the goals of the plans involved.

Applications:

Jointly with the US Navy, Lockheed Martin, BBN, and several other organizations, we have built an application that uses sensor readings to predict where and when an enemy submarine will be in the future. Jointly with the US Army and BAE Systems, we have developed a system to track enemy vehicles moving across a road network, predict where and when they will be in the future, and how best to deploy coalition assets to neutralize the threat.

SPONSORS AND COLLABORATORS

US Army

BAE Systems

NRL

Lockheed Martin

BBN

 

Reasoning about Cultures

^ top

There are numerous applications where we wish to reason about how a political organization or a tribal group or an economic group might behave under certain circumstances. We are developing the theory and algorithms required to understand the context within which such entities function and to develop models of behavior for such entities.

We have developed a multi-lingual opinion analysis system called OASYS using which we can extract the intensity of opinion that people might have on a given topic.

We have developed a stochastic opponent modeling language called SOMA that can be used to develop behavioral models of third parties in given situations.

We have developed an ontology extractor that can automatically extract RDF and extended RDF ontologies from multiple heterogeneous data sources including free text and relational data sources. We have applied this extractor to extract information about the Basques in Spain, the Kikuyus in Kenya, and various tribes along the Pakistan/Afghanistan borderlands.

We are developing algorithms to automatically extract stochastic opponent models from a body of data (news events, etc).

We are developing partial-information game-theoretic models of opponent behavior to understand how a given set of actions might cause a response and to understand what actions to take in order to elicit a desired response with high probability.

Faculty Involved:

V.S. Subrahmanian

Dana Nau

Jon Wilkenfeld

John Steinbruner

Students Involved:

Massimiliano Albanese

Gerardo Simari

Amy Sliva

 

Adversarial Agents

^ top

   

STORY

^ top

AIM: Filter huge amounts of data to deliver short, succinct, personalized stories to multiple users using diverse devices.

GOALS: Extract stories about People, Places, Organizations, Events from multiple heterogeneous data sources: Text documents, Web sources, Relational databases, Object databases, Flat files, Proprietary formats; Automatically customize stories to fit user needs; Deliver stories across multiple access devices: Wireless PDA, Laptops, Cell phones

Key applications:Stories about Greek characters (with Pompeii), Stories about Pakistani nuclear scientists (with US army), Stories about tribes on Pakistan/ Afghan border (with US army)

RDF extraction RDF is a World Wide Web Consortium ontology standard. System can infer RDF extraction rules from examples and then apply these rules to extract (Entity, Attribute, Value) triples from documents, relational DBs, XML sources, etc. Time-stamped values and set-valued types are also permitted

STORY algorithms Goal: have a story of size K (set by user) with high information content and good prose quality. Defined as a multi-objective optimization problem - NP-hard to create an optimal story. Story evaluation is a linear combination of: fact priority, story continuity and repetition.

Several algorithms: OptSTORY - the optimal story DynStory - dynamic programming GenStory - genetic programming

Human subjects found that our system generates stories with highly valuable facts and that prose quality is acceptable.

   

OASYS: An Opinion Analysis System

^ top

There are numerous applications where we wish to know the intensity of opinion on a given topic expressed in a collection of documents. Fro instance, a company might wish to know what bloggers have to say about a given product. Alternatively, the US military might wish to know the intensity of opinion in the Pakistani press about the Abu Ghraib scandal.

Our OASYS system can look at a collection of documents (in multiple languages) and assign an "intensity" of opinion of a given document on a given topic. The intensity of opinion of document d with regard to topic t depends not only on the terms used in the document, but also on the perceptions of the reader. As a consequence, we have developed statistical algorithms conditioned by human responses/input to create an intensity scoring model. Our current (ongoing) prototype shows how the system behaves on English, Spanish and Italian documents.

 

GIDSTAR: Global Infectious Disease Surveillance Tracking and Analysis Repository

^ top

We are developing a software platform called GIDSTAR to gather information about diseases occurring around the world and track them in real-time so as to provide alerts to relevant public health officials well before a wide-spread outbreak occurs. This requires:

  • Extracting information about outbreaks from a wide variety of text-rich sources such as DCO and PDF files
  • Culling news reports from various countries around the world for outbreak information
  • Geo-referencing outbreak or symptom data and correlating this with land cover data, poplulation density data, drainage data, temperature and weather data, road map data, and other demographic data
  • Mining this data for possible correlations between outbreak occurences and such phenomena - this can often serve as an effective preditor of future outbreaks.
  • Providing appropriate notificaions and suggested publi health actions to take when an outbreak occurs.
  • Our initial focus has been on diarrheal diseases in Kenya. We will shortly be looking at the avian flu.

    Faculty Participants:
  • V.S. Subrahmanian
  • Other Participants:

  • Louise-Kelly Hope (NIH)
  • Diego Reforgiato
  • Pasquale Capasso
  •  

    IMPACT: Interactive Maryland Platform for Agents Collaborating Together

    ^ top

    The rapid proliferation of data on the Internet and the ability to harness both data and Internet capabilities has made agent technology very attractive for a wide variety of applications. Past definitions of agents never specified what it means for a piece of software to be considered an agent and how to allow legacy data sources and software modules to be leveraged by agents. The IMPACT project rectified these shortcomings. It is the first effort to describe how to "agentize" legacy pieces of code in both a formal way, as well as via a practical application. The basic theory of IMPACT is described by our book "Heterogeneous Agent Systems" (MIT Press, 2000). In this book and related papers, we developed algorithms to agentize legacy software and data sources, gave a formal semantics for such agents, develop agents that can reason about other agents, developed agents that can reasoning about time, and agents that can reason in uncertain domains. We also developed a software platform for IMPACT supporting some of these capabilities (but not all).

    Current work focuses largely on how agents can scale up to large scale applications. Our approach to this is four fold. First, we develop ways by which agents merge multiple tasks to minimize computation. Second, we develop ways to group similar tasks together so that these clusters of similar tasks can be merged effectively. Third, we have developed methods to distribute agent workloads to other agents capable of performing tasks or subtasks. Fourth, we are currently studying ways to clone agents and use a mix of cloning and data caching to scale agent performance.

    Collaborators:
    Univ. of Maryland
    Univ. of Manchester (UK)
    Technische Universitat Wien (Austria)
    Univ. di Napoli (Italy)
    Univ. of Genoa (Italy)
    Bar-Ilan University (Israel)

    Applications:
    Army Research Lab: Integrating logistical and tactical battlefield knowledge
    Army Research Lab: Agentizing the Combat Information Processor (CIP) system
    SAIC & US Army: Agent based implementation of the Army Flow Model
    CoAX: Coalitions Agent Experiment jointly with many partners including
    Lockheed Martin, BBN, and others
    SenseIT: Tasking and monitoring battlefield sensor data
    with many partners including BBN, BAE Systems, Fantastic Data, and
    many others

     

    PASTA: Probabilistic and Spatio Temporal Agents

    ^ top

    There are numerous applications where there is uncertainty about where and when certain events have occurred or will occur. For example, we may be uncertain about when and where an enemy submarine will launch and attack. Alternatively, in our everyday lives, there is uncertainty about when and where there will be traffic jams. In an electricity market, there is uncertainty about how much electricity will be required by different utilities at different points in time and space, and what the price of such electricity will be. We have already developed a formal theoretical model of a heterogeneous temporal probabilistic (HTP) agent. HTP agents support temporal probabilistic reasoning over heterogeneous data and software sources - the first of their kind.

    In current work, we are developing the concept of a spatio temporal probabilistic agent, as well as a prototype implementation of the HTP paradigm. Key research questions being addressed focus on scalability.

    Collaborators:
    Univ. of Maryland
    Univ. of Manchester
    Bar-Ilan University

    Applications:
    CoAX: Coalitions Agent Experiment jointly with many partners including
    Lockheed Martin, BBN, and others

     

    Multi-Agent Security and Survivability

    ^ top

    As the sheer number of deployment multiagent applications increases, there is a growing need to ensure the security and survivability of both the individual agents themselves, as well as the network of agents involved. We are developing a suite of architectures and methods to protect individual agents from being compromised, as well as methods to protect multiagent systems from being rendered inoperable due to malicious attacks and/or systems failures.

    Our first contribution was a theoretical study of how to protect agents from being compromised by external sources. Our second contribution was how to ensure the survivability of a multiagent system.

    Current work focuses on different architectures to ensure distributed survivability.

    Collaborators:
    Univ. of Maryland
    Univ. of Manchester
    Bar-Ilan University


    HOME: Heterogeneous Ontology Management Engine

    ^ top

    Though there has been tremendous interest in ontologies in the Semantic Web community, and tremendous interest in querying and integrating heterogeneous data sources, the two have not come together. We have been developing methods to associate an ontology with a data source. Our methods include algorithms and tools to infer ontologies from data sources. Our ontologies are sets of directed acyclic graphs (such a set, may, for example, contain a graph representing "isa" relationships, another graph representing "partof" relationships, and yet another graph representing "affects" relationships). An ontology extended data source
    consists of a data source with an associated ontology. We have extended the relational model of data to support querying ontology extended relational databases, as well as ontology extended XML databases. We have built a prototype system called HOME.

    Current efforts focus on (i) extending the syntax of ontologies from directed acycle graphs to broader classes, (ii) supporting querying over ontology extended RDF sources, as well as (iii) supporting ontology extended agent interactions.

    Participants:
    Univ. of Maryland
    Hewlett Packard
    US Army Researh Lab
    US Naval Research Lab

     

    Multimedia Knowledge Management

    ^ top

    We have done extensive work over the years in the creation, storage and querying of multimedia data. Principles of Multimedia Database Systems and Multimedia Database Systems: Issues and Research Directions are two of the books we have written.

    We developed the first theory of multimedia database systems in the early/mid 90s. Later, we developed the CHIMP system - one of the first systems to automatically create multimedia presentations that showed different data to different users based on context. More recently, we have built models of databases for audio, video, multimedia presentations and Powerpoint data.

    We are currently working on scaling and summarizing audio video databases, and the
    concept of multimedia stories in conjunction with the archaeological department at Pompeii. We are studying ways to manage massive amounts of multimedia knowledge and to draw interesting inferences and aggregate activities from video data.

    Participants:
    Univ. of Maryland
    US Army Researh Lab
    Univ. of Naples (Italy)
    Univ. of Turin (Italy)
    Archaeological Dept. Pompeii

     

    Probabilistic databases

    ^ top

    Our lab is one of the pioneers in probabilistic databases. In the mid-90s, we proposed the ProbView data model - this is the first probabilistic data model that got rid of hidden independence assumptions and allowed the end user to ask queries taking into account, his knowledge of the dependencies between events. Later, we developed extensions of this model to handle probabilities in temporal databases, as well as object bases containing probabilities. We also developed probabilistic models of XML databases.

    More recently, we have been studying the problem of efficient computation of probabilistic aggregates. We are also developing the concept of spatio temporal probabilistic databases.

    Participants:
    Univ. of Maryland
    Univ. of Rome
    Technical Univ. of Vienna, Austria

     

    Probabilistic Logics

    ^ top

    Our lab developed the now well known annotated logics used extensively nowadays to reason about uncertainty of different types. We developed the first probabilistic logic programs and gave them a syntax and semantics - many others have since implemented probabilistic LP systems. In addition, we were the first to develop temporal probabilistic logic programs.

    We are currently working on planning in uncertain domains, as well as probabilistic spatio temporal logics and agents. We are particularly looking at the use of probabilistic reasoning in conjunction with image and video analysis.

    Participants
    Univ. of Maryland

     

    Nonmonotonic Reasoning

    ^ top

    Though we are not working a lot in nonmonotonic logics, our work is some of the best known in this field. Almost all algorithms to compute stable models of logic programs are heavily influenced by our classical algorithm for this based on a mix of branch and bound and well founded model computation. We also developed the first methods for nonground computation of stable models.