| 
|
|
|
|
Office: A.V. Williams 4468
Office phone: (301)-405-1765
Email: udrea [at] umiacs DOT umd DOT edu
|
|
|
| "The opposite of a correct statement is a false statement. The opposite of a profound truth may well be another profound truth." - Niels Bohr (1885-1962) |
Conference Papers
| 1. |
Octavian Udrea, Yu Deng, Edna Ruckhaus, and V. S. Subrahmanian. A Graph Theoretical Foundation for Integrating RDF Ontologies. In AAAI, pages 1442–1450, 2005.
RDF ontologies are rapidly increasing in number. We study
the problem of integrating two RDF ontologies under a given set H of Horn clauses that specify semantic relationships between terms in the ontology, as well as under a given set of negative constraints. We formally define the notion of a witness to the integrability of two RDF ontologies under such constraints. A witness represents a way of integrating the ontologies together. We define a minimal witnesses and provide the polynomial CROW (Computing RDF Ontology Witness) algorithm to find a witness. We report on the performance of CROW both on DAML, SchemaWeb and Onto-Broker ontologies as well as on synthetically generated data. The experiments show that CROW works very well on reallife ontologies and scales to massive ontologies. |
|
| 2. |
Octavian Udrea, Yu Deng, Edward Hung, and V. S. Subrahmanian. Probabilistic Ontologies and Relational Databases. In ODBASE, pages 1–17, 2005.
The relational algebra and calculus do not take the semantics
of terms into account when answering queries. As a consequence, not all tuples that should be returned in response to a query are always returned, leading to low recall. In this paper, we propose the novel notion of a constrained probabilistic ontology (CPO). We developed the concept of a CPO-enhanced relation in which each attribute of a relation has an associated CPO. These CPOs describe relationships between terms occurring in the domain of that attribute. We show that the relational algebra can be extended to handle CPO-enhanced relations. This allows queries to yield sets of tuples, each of which has a probability of being correct. |
|
| 3. |
Octavian Udrea, Diego Reforgiato Recupero, and V. S. Subrahmanian. Annotated RDF. In ESWC, pages 487–501, 2006.
There are numerous extensions of RDF that support temporal
reasoning, reasoning about pedigree, reasoning about uncertainty, and so on. In this paper, we present Annotated RDF (or aRDF for short) in which RDF triples are annotated by members of a partially ordered set (with bottom element) that can be selected in any way desired by the user. We present a formal declarative semantics (model theory) for annotated RDF and develop algorithms to check consistency of aRDF theories and to answer queries to aRDF theories. We show that annotated RDF captures versions of all the forms of reasoning mentioned above within a single unied framework. We develop a prototype aRDF implementation and show that our algorithms work very fast indeed - in fact, in just a matter of seconds for theories with over 100,000 nodes. |
|
| 4. |
Octavian Udrea, Cristian Lumezanu, and Jeffrey S. Foster. Rule-based Static Analysis of Network Protocol Implementations. In USENIX-SS’06, pages 14–28, 2006.
Today's software systems communicate over the Internet using standard protocols that have been heavily scrutinized, providing some assurance of resistance to malicious attacks and general robustness. However, the software that implements those protocols may still contain
mistakes, and an incorrect implementation could lead to vulnerabilities even in the most well-understood protocol. The goal of this work is to close this gap by introducing a new technique for checking that a C implementation of a protocol matches its description in an RFC or similar standards document. We present a static (compile-time) source code analysis tool called Pistachio that checks C code against a rule-based specication of its behavior. Rules describe what should happen during each round of communication, and can be used to enforce constraints on ordering of operations and on data values. Our analysis is not guaranteed sound due to some heuristic
approximations it makes, but has a low false negative rate in practice when compared to known bug reports. We have applied Pistachio to implementations of SSH and RCP, and our system was able to nd many bugs, including security vulnerabilities, that we conrmed by
hand and checked against each project's bug databases. |
|
| 5. |
Octavian Udrea, V. S. Subrahmanian, and Zoran Majkic. Probabilistic RDF. In IEEE IRI, pages 172–177, 2006.
RDF is rapidly being adopted around the world as
a paradigm for knowledge representation. However, we are not aware of any version of RDF that can express probabilistic knowledge. In this paper, we develop a framework called Probabilistic RDF (pRDF for short) { we provide a syntax as well as a model theoretic and
xpoint semantics for acyclic pRDF theories. We then provide algorithms to eciently answer queries posed to pRDF ontologies. We have developed a prototype implementation that shows that our algorithms work very well in practice. |
|
| 6. |
Massimiliano Albanese, Vincenzo Moscato, Antonio Picariello, V. S. Subrahmanian, and Octavian Udrea. Detecting Stochastically Scheduled Activities in Video. In IJCAI, pages 1802–1807, 2007.
The ability to automatically detect activities in
video is of increasing importance in applications such as bank security, airport tarmac security, baggage area security and building site surveillance. We present a stochastic activity model composed of atomic actions which are directly observable through image understanding primitives. We focus on answering two types of questions: (i) what are the minimal sub-videos in which a given action is
identified with probability above a certain threshold and (ii) for a given video, can we decide which activity from a given set most likely occurred? We provide the MPS algorithm for the first problem, as well as two different algorithms (naiveMPA and MPA) to solve the second. Our experimental results on a dataset consisting of staged bank robbery videos (described in [Vu et al., 2003]) show that
our algorithms are both fast and provide high quality results when compared to human reviewers. |
|
| 7. |
Octavian Udrea, Lise Getoor, and Renée J. Miller. Leveraging Data and Structure in Ontology Integration. In SIGMOD, pages 449–460, 2007.
There is a great deal of research on ontology integration
which makes use of rich logical constraints to reason about the structural and logical alignment of ontologies. There is also considerable work on matching data instances from heterogeneous schema or ontologies. However, little work exploits the fact that ontologies include both data and structure. We aim to close this gap by presenting a new algorithm (ILIADS) that tightly integrates both data matching and logical reasoning to achieve better matching of ontologies. We evaluate our algorithm on a set of 30 pairs of OWL Lite ontologies with the schema and data matchings found by human reviewers. We compare against two systems -
the ontology matching tool FCA-merge [28] and the schema matching tool COMA++ [1]. ILIADS shows an average improvement of 25% in quality over FCA-merge and a 11% improvement in recall over COMA++. |
|
| 8. |
Octavian Udrea, Andrea Pugliese, and V. S. Subrahmanian. GRIN: A Graph Based RDF Index. In AAAI, pages 1465–1470, 2007.
RDF (Resource Description Framework) is now a widely
used World Wide Web Consortium standard. However, methods to index large volumes of RDF data are still in their infancy. In this paper, we focus on providing a very lightweight indexing mechanism for certain kinds of RDF queries, namely graph-based queries where there is a need to traverse edges in the graph determined by an RDF database. Our approach uses the idea of drawing circles around selected
“center” vertices in the graph where the circle would encompass those vertices in the graph that are within a given distance of the “center” vertex. We come up with methods of finding such center vertices and identifying the radius of the circles and then leverage this to build an index called GRIN. We compare GRIN with three existing RDF indexex: Jena, Sesame, and RDFBroker. We compared (i) the time to answer graph based queries, (ii) memory needed to store the index, and (iii) the time to build the index. GRIN outperforms
Jena, Sesame and RDFBroker on all three measures for graph based queries (for other types of queries, it may be worth building one of these other indexes and using it), at the expense of using a larger amount of memory when answering queries. |
|
| 9. |
Massimiliano Albanese, Andrea Pugliese, V. S. Subrahmanian, and Octavian Udrea. MAGIC: A Multi-Activity Graph Index for Activity Detection. In IEEE IRI, pages 267–272, 2007.
Suppose we are given a set A of activities of interest, a
set O of observations, and a probability threshold p. We are interested in finding the set of all pairs (a, O'), where
a is an activity and O' is a subset of O, that minimally validate the fact that an instance of activity a occurs in O with probability p or more. The novel contribution of this paper is the notion of the multi-activity graph index (MAGIC), which can index very large numbers of observations from interleaved activities and quickly retrieve completed instances of the monitored activities. We introduce two complexity reducing restrictions of the problem (which takes exponential time) and develop algorithms for each. We experimentally evaluate our exponential algorithm as well as the restricted algorithms on both synthetic data and a real (depersonalized) travel
data set consisting of 5.5 million observations. Our experiments show that MAGIC consumes reasonable amounts of memory and can retrieve completed instances of activities in just a few seconds. We also report appropriate statistical significance results validating our experimental hypotheses. |
|
| 10. |
Octavian Udrea, Zoran Majkic, and V. S. Subrahmanian. Aggregates in Generalized Temporally Indeterminate Databases. In Scalability in Uncertainty Management, pages 171–186, 2007.
Dyreson and Snodgrass as well as Dekhtyar et. al. have provided a
probabilistic model (as well as compelling example applications) for why there may be temporal indeterminacy in databases. In this paper, we first propose a formal model for aggregate computation in such databases when there is uncertainty not just in the temporal attribute, but also in the ordinary (non-temporal) attributes. We identify two types of aggregates: event correlated aggregates, and
non event correlated aggregations, and provide efficient algorithms for both of them. We prove that our algorithms are correct, and we present experimental results showing that the algorithms work well in practice. |
|
Journal Papers
| 11. |
V. S. Subrahmanian, Massimiliano Albanese, Maria Vanina Martinez, Dana Nau, Diego Reforgiato, Gerardo I. Simari, Amy Sliva, Octavian Udrea, and Jonathan Wilkenfeld. CARA: A Cultural-Reasoning Architecture. IEEE Intelligent Systems, 22(2):12–16, 2007.
There’s a constant need to reason about diverse cultures
all over the world. For example, the US military might wish to build cultural models of different tribes on the Pakistan-Afghanistan border to elicit their support in the hunt for Al-Qaeda terrorists. Or, a World Bank loan aimed at reducing opium cultivation along the Pakistan- Afghanistan border would greatly benefit from a model of the sociological, economic, political, and religious behaviors
of the tribes in opium-producing regions. By building such models and applying them, the World Bank might better understand how to target its loan dollars to better achieve its goals. Past cultural-reasoning research has focused primarily on techniques to organize, catalog, and reason about cultural and historical artifacts of the kind typically stored in a museum. This is extremely valuable. However, the term cultural reasoning as we use it in the previous examples (and in this article) focuses on understanding how different cultural groups today make decisions and what factors those decisions are based on. An architecture that supports cultural reasoning should, for example, be able to pinpoint characteristics that differentiate organizations engaging in political action within legitimate frameworks from those engaging in violence and terror. Key in all this is that cultural reasoning must go hand in hand with environmental reasoning—you can’t separate them.
|
|
| 12. |
Octavian Udrea, Cristian Lumezanu, and Jeffrey S. Foster. Rule-based Static Analysis of Network Protocol Implementations. To appear in Information and Computation Special Issue on Foundations and Automated Reasoning, 2007. Extended version of [4].
Today’s software systems communicate over the Internet using standard protocols that have been heavily scrutinized, providing some assurance of resistance to malicious attacks and general robustness. However, the software that implements those protocols may still contain mistakes, and an incorrect implementation could lead to vulnerabilities even in the most well-understood protocol. The goal of this work is to close this gap by introducing a new technique for checking that a C implementation of a protocol matches its description in an RFC or similar standards document. We present a static (compile-time) source code analysis tool called Pistachio that checks C code against a rule-based specification of its behavior. Rules describe what should happen during each round of communication, and can be used to enforce constraints on ordering of operations and on data values. Our analysis is not guaranteed sound due to some heuristic approximations it makes, but has a low false negative rate in practice when compared to known bug reports. We have applied Pistachio to two different implementations of SSH2 and an implementation of RCP. Pistachio discovered a multitude of bugs, including security vulnerabilities, that we confirmed by hand and checked against each project’s bug databases.
|
|
Workshop/demo papers
| 13. |
Octavian Udrea and Lise Getoor. Combining Statistical and Logical Inference for Ontology Alignment. In IJCAI-07 Workshop on Semantic Web for Collaborative Knowledge Acquisition, 2007.
In recent years, ontologies have proved a very
attractive medium for storing domain knowledge in a variety of organizations; a few example domains are medicine1 and genetics.
The rapid evolution of representation standards has led to the creation of distinct ontology bases in overlapping domains, making
it paramount for large systems to reconcile ontological data from multiple sources. A large body of work on ontology integration[Euzenat, 2004] has produced algorithms and heuristics that have proven successful in alleviating the high computation costs of the process. However, the effective use of ontology formalisms – such as rules and axioms – in the integration process remains an open question. In this paper, we present a novel ontology integration method that takes advantage of the logical inference capabilities in OWL Lite to improve the recall of the merged ontologies. Our CPI (Clustering with Partial Inference) algorithm uses (i) graph clustering techniques to improve the process of selecting candidate matching entities and (ii) a limited, adaptive reasoning process
to account for the effects of introducing relationships between such entities. Our initial experiments, performed on a set of 30 pairs of publicly available ontologies, show that CPI offers a 30% improvement in recall and a 25% improvement in answer quality compared to FCAmerge [Stumme and Maedche, 2001].
|
|
| 14. |
Octavian Udrea, Lise Getoor, and Renée J. Miller. Homer: Ontology Alignment Visualization and Analysis. Demo paper in ISWC 2007.
We present HOMER, an analysis and visualization tool for
ontology alignment. HOMER features a radial-graph display GUI, a complete execution trace that allows the user to override and navigate to any match decisions during at runtime and a comparison mode that displays multiple alignments in parallel. HOMER contains a builtin plugin for the ILIADS ontology alignment algorithm, but other algorithms can be plugged in as well.
|
|
Books
| 15. |
Stefan Trausan-Matu, Valentin Cristea and Octavian Udrea (eds.). Intelligent e-Learning (in Romanian). Ed. PolitehnicaPress, ISBN 973-8449-85-5, Bucharest, 2005. |
|
Papers under review
| 16. |
Octavian Udrea, Diego Reforgiato Recupero, and V. S. Subrahmanian. Annotated RDF. Submitted to ACM Transactions on Computational Logic. Extended version of [3].
There are numerous extensions of RDF that support reasoning about uncertainty, reasoning about pedigree, reasoning about time, and so on. In this paper, we present Annotated RDF (or aRDF for short) in which RDF triples are annotated by members of a partially ordered set (with bottom element) that can be selected in any way desired by the user. We present a formal declarative semantics (model theory) for annotated RDF and develop algorithms to check consistency of aRDF theories and to answer queries to aRDF theories. We show that annotated RDF captures versions of all the forms of reasoning mentioned above within a single unified framework. We develop a prototype aRDF implementation and show that our algorithms work efficiently even on real world data sets containing over 10 million triples.
|
|
| 17. |
Andrea Pugliese, Octavian Udrea, and V. S. Subrahmanian. Scaling RDF with Time. Submitted to WWW’08.
The World Wide Web Consortium's RDF standard primarily consists of (subject,property,object) triples that specify the value that a given subject has for a given property. However, it is frequently the case that even for a fixed subject and property, the value varies with time (examples of such properties include salary, weight, etc.). As a consequence, efforts have been made to annotate RDF triples with valid time intervals. However, to date, no proposals exist for
e±cient indexing of such temporal RDF databases. It is clearly beneficial to store RDF data in a relational DB - however, standard relational indexes are inadequately equipped to handle RDF's graph structure. In this paper, we propose the tGRIN index structure that builds a specialized index for temporal RDF that is physically stored in an RDBMS. Past efforts to store RDF in relational stores include Jena2 from HP, Sesame from OpenRDF.org, and 3store from the University of Southampton. We show that even when these
efforts are augmented with well known temporal indexes like R+ trees, SR-trees, ST-index, and MAP21, the tGRIN index exhibits superior performance. In terms of index build time, tGRIN takes two thirds or less of the time used by any other system, and it uses a comparable amount of memory and less disk space than Jena, Sesame and 3store. More importantly, tGRIN can answer queries three to six times faster for average query graph patterns and five to ten times faster for complex queries than these systems.
New! Accepted for publication 01/15/08. |
|
| 18. |
Massimiliano Albanese, Rama Chellappa, Vincenzo Moscato, Antonio Picariello, V.S. Subrahmanian, Pavan Turaga and Octavian Udrea. A Constrained Probabilistic Petri Net Framework for Human Activity Detection in Video. Submitted to IEEE Transactions on Multimedia Systems.
Recognition of human activities in restricted settings such as airports, parking lots and banks is of significant interest in security and automated surveillance systems. In such settings, data is usually in the form of surveillance videos with wide variation in quality and granularity. Interpretation and identification of human activities requires an activity model that is a) rich enough to handle complex multi-agent interactions, b) is robust to uncertainty in low-level processing and c) can handle ambiguities in the unfolding of activities. Efficient inference algorithms based on the activity model then provide means to accomplish the tasks of activity recognition and anomaly detection. Toward this end, we present a computational framework for human activity representation based on Petri nets. We propose an extension – Probabilistic Petri Nets (PPN)- and show how this model is well suited to address each of the above requirements in a wide variety of settings. Then, we focus on answering two types of questions: (i) what are the minimal sub-videos in which a given activity is identified with a probability above a certain threshold and (ii) for a given video, which activity from a given set occurred with the highest probability ? We provide the PPN-MPS algorithm for the first problem, as well as two different algorithms (naivePPN-MPA and PPN-MPA) to solve the second. Our experimental results on a dataset consisting of bank surveillance videos and an unconstrained TSA tarmac surveillance dataset show that our algorithms are both fast and provide high quality results.
|
|
|
Last updated: January 6 2008 |
|