OCTAVIAN UDREA

Office: A.V. Williams 4468

Office phone: (301)-405-1765

Email: udrea [at] umiacs DOT umd DOT edu


"The opposite of a correct statement is a false statement. The opposite of a profound truth may well be another profound truth." - Niels Bohr (1885-1962)

 

My research is concerned with methods to extract, use and reason about the knowledge hidden within the massive amount of data on the Web. The ultimate purpose is that of achieving intelligent behavior as an emergent property of the highly complex semantic Web. This poses natural problems both in artificial intelligence (automated reasoning, knowledge integration) and in databases (storage, query optimization, provenance). My thesis Scalable Ontology Systems (in progress) describes new query, indexing and integration algorithms tailored for knowledge represented in the form of ontologies. In the near future, I am interested in researching cost-based models for query optimization in knowledge bases and combined semantics for deductive, inductive and abductive reasoning in ontologies. The next section describes my thesis research in more depth.

Knowledge representation and querying. The Resource Description Framework (RDF) is a method for representing knowledge under the form of (subject, property, object) triples. In practice, triples often need additional metadata such as the probability of being true, validity intervals and so on. To represent this information, traditional RDF uses reification, a technique which poses several efficiency problems in formulating and answering queries. To avoid reification in specific problem domains, extensions to RDF for temporal or fuzzy information have been previously defined. I developed a common semantics for extending RDF with any partially ordered information. The framework called Annotated RDF (aRDF) includes algorithms to answer aRDF queries and maintain aRDF views.

Indexing RDF. Existing RDF storage systems such as Jena, Sesame, 3Store, etc. rely on relational indexing to access RDF data more efficiently. However, my results have shown that using relational indexes for RDF causes processing time to increase with the number of constraints in the query. This counterintuitive result is due to the fact that a relational index does not match RDF query access patterns well. I developed a new index structure called GRIN (Graph-based RDF INdex, patent pending) which can be used to answer queries two to three times faster than the leading RDF storage systems with comparable resource expenditure.

Knowledge integration. One of the principle ways of building larger knowledge bases is that of integrating ontologies from different sources or organizations. Knowledge integration will always be a semi-automated process at best – differences in vocabulary and interpretation of concepts will always exist. In my research, I started by developing an algorithm called CROW for integrating RDF ontologies under a given set of positive and negative constraints. More recently, I have developed an algorithm called ILIADS (Integrated Learning in Alignment of Data and Schema) that discovers integration constraints for ontologies in the Web Ontology Language (OWL) and produces a consistent integration of two OWL ontologies. ILIADS seamlessly combines statistical and logical inference methods to produce better recall at comparable precision than leading algorithms such as FCA-Merge and COMA++.

I have also had the opportunity to apply my interests in artificial intelligence and database research to other areas in Computer Science. My collaborations with researchers in Programming Languages and Computer Vision – resulting in several publications – are outlined below.

Code verification. Together with Prof. Jeffrey Foster and my fellow student Cristian Lumezanu, I developed and implemented a tool called Pistachio that detects vulnerabilities in network protocol implementations using a rule-based specification derived from natural language documents such as Requests for Comments (RFCs). We tested Pistachio on implementations of RCP and openSSH, on which it missed very few vulnerabilities (3 – 9%). We are currently looking into automatically translating warnings from code verification tools to suggested code repairs.

Activity detection in video. Algorithms in Computer Vision such as segmentation, object recognition, trajectory tracking, etc. are able to provide a labeling of a video that gives the state of each object in a scene with a certain level of accuracy. However, to detect complex activities over time such as package transfers or tarmac security violations at an airport, we must be able to define a model of an activity based on the primitive states in the labeling. We should also be able to identify portions of the video that match the activity model. Together with researchers from the University of Naples, Italy, we proposed two methods of defining an activity – based on labeled stochastic automata and probabilistic Petri-nets -, and defined algorithms to answer two types of queries:

  1. detect the minimal subsequences of a video that match a given activity definition and
  2. detect the set of activity definitions that best explain a given video sequence.

Furthermore, together with researchers from the University of Calabria, Italy, we have developed an index structure called MAGIC that can track multiple activity definitions simultaneously and answer the two types of queries above very quickly.

 

Last updated: January 6 2008