Alexander Dekhtyar

Department of Computer Science 
University of Maryland, College Park, MD, 20-737 
email: dekhtyar@cs.umd.edu
web: http://www.cs.umd.edu/~dekhtyar/academ
 

Research Statement


1. Current Research

My current research deals with probabilistic information
in Databases and Logic Programming. 

1.1 Probabilistic Temporal Databases.
(joint work with Robert Ross and V.S. Subrahmanian)

A number of database applications (e.g., scheduling databases for
transportation providers, time-series stock databases, video extraction
applications)  require the ability to make statements of the
following kind: Data tuple d from relation R is true 
at some point of time in the interval [t,t'] 
with probability between p and p'.  For example, a tuple in
a scheduling database may contain information that a package shipped
from Washington, DC to New York via UPS arrives at the destination
between 12 and 36 hours after being sent with  probability 95% - 100%.
Additionally, the database may contain information that the probability
of arrival at a particular hour for that time interval
is distributed uniformly.
Until now, such information could be only represented in
either purely temporal or purely probabilistic databases.
Such representations had a number of problems:


 * Problem of compact representation.

   Some applications of temporal databases require the chronon 
   to be rather small (seconds, or even milliseconds). To represent 
   uncertain information in purely probabilistic databases, one 
   would need to include a tuple with an approapriate probability 
   for each  timepoint when an event could possibly occur. 
   In situations where chronons are small, this would lead to massive 
   amounts of data being stored.

 * Problem of keeping track of probabilistic distributions.

   Temporal databases allow compact representation of temporal data.
   However, once data is represented in a compact form as a temporal 
   tuple, the only probabilistic information accompanying it would 
   be the probability of the tuple being true at the time period 
   specified in the tuple. What is missing is the information about 
   finding the probabilities  of the tuple being true at any given 
   sub-period.


 We have introduced the concept of Probabilistic Temporal
Databases, which are designed to address these problems.
Our main contributions are summarized as follows.


 (*) We have introduced the concept of a temporal-probabilistic
     tuple or TP-tuple, for short. Intuitively, a TP-tuple allows 
     us to augment classical relational database tuples with 
     temporal-probabilistic data, as well as arbitrary probability 
     distributions.  For example, not only can we say "Data tuple d 
     is in relation r at some point of time in the interval [t,t'] 
     with probability between p and p' " but we can also say that 
     the probability mass is distributed over [t, t']$ according to 
     an arbitrary probability distribution.

 (*) We have shown how given any TP-tuple tp, we may ``flatten'' tp
     into a set of annotated tuples.  In general, the set of annotated
     tuples associated with a single TP-tuple can be very large --- hence,
     annotated tuples serve as a {\em purely theoretical device}.

  (*) We  have defined a Theoretical Annotated Temporal Algebra (TATA) 
      and showed how the classical relational algebra operations can 
      be extended to the case of annotated tuples.   Intuitively, 
      the Theoretical Annotated Temporal Algebra provides a theoretical 
      specification of how the TP-Algebra operations must be defined.
      We have introduced a Temporal-Probabilistic Algebra (TPA)
      which directly manipulates TP-tuples without converting them to
      annotated tuples.  This has a great advantage, as TP-tuples are
      very succinct objects.  We showed that for each operation in the
      Theoretical Annotated Temporal Algebra, there is a corresponding 
      operation in the Temporal-Probabilistic Algebra which precisely 
      captures it.  Thus, the Temporal-Probabilistic Algebra is a sound 
      and complete way of implementing the declarative semantics for 
      temporal probabilistic data prescribed by the Theoretical Annotated 
      Temporal Algebra.  Correctness results are formally proved for 
      each operation.

  (*) We have shown that each operator, whether in the TP-Algebra, 
      or in the Theoretical Annotated Temporal Algebra, can be 
      parameterized by the user's knowledge of the dependencies between 
      events.  This is important because, the probability of a complex 
      event like  "e1 or e2" depends upon our knowledge of the 
      dependencies between "e1" and "e2".

  (*) We have established a number of query equivalence and query 
      containment results in TATA and TPA. Work on query optimization 
      is currently underway.

  (*) A prototype Probabilistic Temporal Database System has been 
      implemented. The system has a web interface  
      (http://bester.cs.umd.edu/TP/index.htm) and communicates with a 
      number of commercial database systems, which store the actual
      data (Paradox, Oracle) via ODBC. Experiments showed the benefits of 
      storing and operating on data in compact (tp-) form as opposed to 
      the expanded (annotated) form.


1.2 Temporal Probabilistic Logic Programs.
(joint work with V.S. Subrahmanian)

Concurrently with our research on Probabilistic Temporal Databases,
we have been studying logical reasoning in situations involving 
temporal uncertainty. We have proposed the concept 
of a Temporal Probabilistic Logic Program (or TPLP for short).  
We have defined the syntax of TPLPs and provided a formal model theoretic 
and fixpoint semantics that are shown to coincide.  I
n this work we considered three types of statements:

 [-] statements about the events that can occur only once
     (such as letter/package deliveries)
 
 [-] statements about the events that can occur multiple times
     or have a duration (e.g., occurrence of a character in movie frames)

 [-] uncertain statements independent of time

 Our syntax allows the user to explicitly state the type of each statement
(formula) found in a TP-program and our model theory captures reasoning 
about the temporal probabilities of events of all three types. The fixpoint
semantics uses linear programming to compute the tightest
probability intervals possible.

 We have also studied a subset of the general language where only
statements about uniquely occurring events were allowed and established
a more efficient fixpoint procedure which does not employ
linear programming for this subclass of TP-programs. 


We are currently working on query processing procedures for
TP-programs, and on finding other subclasses of TP-programs for which
the fixpoint semantics will not need linear programming, therefore making
more efficient query processing possible.


1.4 Optimal Status Sets For Heterogeneous Active Agents
(joint work with Thomas Eiter, Thomas Schwentick and V.S. Subrahmanian)

This work is a part of the ongoing IMPACT (Interactive Maryland Platform 
for Agents Collaborating Together). A part of the framework is a
declarative language for Agent Programs which allows agents express 
their ``operating principles'' and to determine the actions they must 
take in order to satisfy them.


 The semantics of Agent Programs is given in terms of \textit{Status
Sets} - sets of actions which satisfy the rules of the program.
Different semantics are possible, and Eiter and Subrahmanian
describe a number of semantics.

 However, only in very simple cases are we  guaranteed that
the status set is unique (i.e., that the agent's choice of actions at 
each step will be deterministic).

 Our current work we proposes a way to overcome this.
We assume that an agent has an objective function which
allows it to evaluate each status set that satisfies its program for
a given semantics. Out of all such status sets, we will select the
one(s) which optimize this objective function. 

The goals of this work are:

 (*)  to provide a simple but powerful language for describing the 
      objective functions in IMPACT. This language should be compatible 
      with the current IMPACT framework.

 (*)  to give complexity analysis of the problem of computing
      an optimal status set, given an agent state and an objective 
      function. We expect the general case to be quite hard, in part 
      due to the arbitrary nature of the objective function.

 (*) to find a number of important classes of objective functions 
     for which the problem of computing an optimal status set is 
     solvable in polynomial time.

 (*) to construct the algorithms for finding optimal status sets
     both in the general case, and for all the classes of objective 
     functions for which we can show that the problem is solvable 
     in better time.

 
1.5 Hybrid Probabilistic Logic Programs.
(joint work with V.S. Subrahmanian and M.I. Dekhtyar)

Most current work on probabilistic logic programming,
assumes the existence of a simple  apiori relationship between
the events described (e.g., independence or ignorance). 
However, this severely limits the applicability of the frameworks.

We have proposed a hybrid probabilistic logic programming
language which was designed to allow the programmer to reason
about the probabilities of event under any assumption about
their relationship. In particular, our contributions are:


 (*) First, we have introduced a general  axiomatic notion of a 
     probabilistic strategy, as the means of  computing probabilities 
     under given assumption about the relationship between the events.  
     We showed that a number of well known probabilistic strategies 
     (e.g., independence, positive correlation, ignorance) are special 
     cases of our definition.

 (*) We have defined the concept of a  hybrid probabilistic program 
     (hp-program, for short). If the user selects a set of probabilistic 
     strategies r1,... ,rk  for use in an hp-program (s/he may select 
     these in any way, as long as these selections satisfy the axioms 
     defining probabilistic strategies), then this automatically defines a 
     set of conjunction and disjunction connectives used to construct 
     formulas from atoms.

 (*) We have defined a fixpoint semantics for hp-programs, a model
     theoretic semantics for hp-programs, and a proof procedure, and prove
     that the fixpoint theory, model theory, and proof theory all lead to
     equivalent characterizations. This applies to any selection of
     probabilistic strategies made by the user, as long as these
     selections satisfy the axioms defining probabilistic strategies.

 (*) We have defined a cache-based proof procedure that extends
     the well known tabling techniques in Logic Programming to handle
     hybrid probabilistic programs.  This procedure is also proved to
     be sound and complete.

 (*) We have studied the computational aspects of hp-programs.
     In particular, we have designed algorithms for computing
     the minimal model of an hp-program, determining whether an hp-program
     entails a formula and determining whether an hp-program is
     consistent. We have shown that entailment problem in general is
     NP-complete, however, when we restrict the heads of
     rules to be of size 2 or less, this problem becomes equivalent to 
     a generalization of a weighted matching problem on graphs.
     The computational complexity in this case depends on the
     set of probabilistic strategies used in the program.
     We have established that for a set of eight most common 
     probabilistic strategies (disjunctive and conjunctive strategies of
     independence, positive correlation, negative correlation and ignorance),
     the entailment problem is solvable in polynomial time.
     It is also solvable in polynomial time for  programs having
     atomic heads for any set of probabilistic strategies

 (*) We have also provided a sound and complete  rule-based inference 
     system for hp-programs. We showed that for all formulas derivable in 
     this system there exists a ``short'' (polynomially bound) derivation.

Though we have not implemented it, it should be mentioned that since
our original results have been published, two implementations of
Hybrid Probabilistic Logic programs appeared. One implementation,
by Terrence Swift at University of Maryland uses the XSB system
(developed at SUNY at Stony Brook) to implement hp-programs with
atomic heads. This implementation has been used to solve some data mining
problems. A somewhat different fragment of HPPs had been implemented by
Stoffel at the University of Neuchatel in Switzerland.


2. Previous Research

During my undergraduate study, my research had been in the area of
finite model theory and Linear Logic. I have studied 
Resource Transformation Nets (RT-nets) -  a model of computation in which
a number of resources can be stored, and transformed at the nodes of 
a graph and transferred from node to node. I showed that a logic based on
the fragment of Multiplicative Linear Logic describes exactly the behavior
of RT-nets. 


3. Future Research

My future research will proceed in the following areas:


(i) N-Dimensional Probabilistic Reasoning. 

    I am planning to extend our results for Probabilistic Temporal 
    Databases and Temporal Probabilistic Logic Programs to the case 
    when the probability is distributed over an N-dimensional 
    (and possibly continuous) set, such 2D or 3D space, or 2D/3D space 
    with time. This work will require  the generalization of many notions, 
    used in the TP-databases and  TP-programs. Database design will also 
    involve work with different spatial data structures such as 
    quadtrees, R-trees etc...


(ii) Intelligent Agents.

    I propose to continue working on the theory and implementation
    of heterogeneous software agents.  I intend to study how an
    agent can make decisions when it is reasoning about uncertain
    temporal events.  For instance, an agent may wish to predict
    certain events in the future.  While such predictions are
    (inevitably) likely to be uncertain, the agent may still base
    its actions and schedules on such predictions.  I propose to
    develop methods by which an agent can represent information
    about multiple possible uncertain future works, and can make
    decisions based on such information based on one or more 
    objective functions the agent has.


(iii) Application of probabilistic methods.

   -  Information Retrieval (IR) studies the problem of matching
      the documents stored in the database with the document
      specifications contained in a given query. A number of aspects of
      IR (document specifications, user query, document relevance) are 
      of a highly uncertain nature. Probabilistic methods, together with
      other formalisms for uncertain reasoning, such as Dempster-Schafer
      belief functions and Fuzzy Logic have been applied to IR. However,
      in most cases, just as in  Probabilistic Logic Programming,
      probabilistic methods make an apiori assumption about 
      the relationship between different document descriptors.   
      I would like to extend our framework of hybrid probabilistic 
      reasoning onto the domain of Information Retrieval.  


   - In Data Mining, a problem of finding different data dependencies 
     in a large body of data is studied. The data dependencies
     found in the data mining process may be interpreted probabilistically.
     In this area, I see an opportunity to combine our probabilistic 
     approach to reasoning with the  approaches to dealing with objective 
     functions, which can be used  to specify different data parameters 
     (such as the concept of an ``interesting'' data dependency) 
     in the data mining framework.


 (iv) Probabilistic Updates.
 
      Given a  body of probabilistic data (e.g., probabilistic database, 
      probabilistic deductive database), and a collection of ``newly
      discovered'' probabilistic information, the question of
      updating the data with the new information arises. While
      working on Probabilistic Temporal Databases and Hybrid Probabilistic
      Programs, we have found that 

     (a) there is no unique ``correct'' way to update the probabilistic 
         information, 
     
     (b) probabilistic updates are not local, i.e., 
         updating a probability of one event, may result in a chain 
         of updates of probabilities of other events in some cases and 

     (c) may require application of non-probabilistic methods
         to preserve consistency of the data. 

     My goal is to study the general properties of probabilistic updates 
     in detail.