HERMES: A Heterogeneous Reasoning and Mediator System

[ Top ] [ Previous Section ] [ Next Section ]

HERMES:
A Heterogeneous Reasoning and Mediator System

2.2. The Mediator Language

The HERMES mediator language can extract information from different domains -- data sources and reasoning systems. An underlying assumption is that each of the domains internally provides a set of operations through which the functionalities of the domain are accessed. Hence, a domain D may be viewed as an abstraction of databases and software packages, and is made up of three components: a set of values, V ; a set F of functions on V ; and a set of relations on the data objects in V . The elements of V may be thought of as the data objects that are being manipulated by the software in question. For example in a numerical computation software package, V , contains the real numbers. In general, values in V may be typed, so that V is composed of a collection of different universes of values. The functions in F take objects in V as input, and return as output, objects from their range. These functions may be thought of as the pre-defined functions existing in the package/domain D that we are seeking to integrate into a mediated system. An example is the function to perform numerical integration in the aforementioned software package. Lastly, the relations are functions over V whose output value is either true or false.

Given a domain D , a domain call is a syntactic expression

domainname:<domainfunction>(<arg1,...,argn>)

where <domainfunction> is the name of the function in F , and <arg1,...,argn> are the arguments that it takes. The informal reading of the call is: in the domain domainname, execute the function domainfunction defined therein on the arguments <arg1,...,argn>.

Before considering the finer details of the mediation language, we first present several examples of domains to ground our discussion in more concrete terms.

2.2.1. The Domain of Relational DBMSs

Consider the PARADOX database management system. The values in this domain consists of the collection of all tables, as well as the individual values that may be stored in the tables. We denote this domain by PARADOX . The functions F over PARADOX include the usual database operations project, select and join. Each of these operations take tables as input, and produces new tables as output. Other operations, for example aggregates, typically accept tables and attribute names as input, and produce real values as their output. Below are some specific examples.

The expression

PARADOX:project('parts',"partid")

invokes the execution of the project operation, projecting out the partid field of the relation called parts . Expressing selects is somewhat more complex. To each boolean condition C , we have a corresponding select-C such that C is imposed on the value of an attribute, specified respectively by the third and the second arguments. Thus, to select all tuples in the parts relation in PARADOX with a cost of over 50, the appropriate domain call may be expressed:

PARADOX:select>('parts',"cost",50).

Operations may be composed. Therefore, getting a list of partids of of parts that cost over 50 may be expressed through the domain call

PARADOX:project(select>('parts',"cost",50),"partid").

Joins may be similarly expressed. For instance, the statement

PARADOX:join('parts1','parts2','partname','partnom')

joins the relations parts1 and parts2 on the fields partname and partnom, respectively. Finally, an example domain call to an aggregate may be

PARADOX:sum(project(select=('parts',"color","green),"cost"))

which finds the total cost of all parts that are green.

This syntax has been used uniformly in HERMES to access any relational database management system, including DBASE, INGRES, and PARADOX.

2.2.2. The Domain of Spatial DBMSs

In a spatial database, the set of values V consists of the collection of all coordinates and the values that may be contained in an ordinary, say relational database. The usual set of operations include RANGE, which can be used to find all points within a specified distance of a given location, and horizontal and vertical slice queries, that can be used to find points in which one of the axis is within some specified distance of a given location. Some example domain calls are as follows.

SPATIAL:RANGE('map',given,distance)
is a range query on a spatial database called 'map' where given is the given pair of (x,y )coordinates.
SPATIAL:VERTSLICE('map',given,distance)
is a vertical slice query on 'map'. If given=(x,y) then this call returns the set of all points (x',y') represented in the 'map' database such that
|x-x'|<= distance.

Note that the above syntax completely abstracts away the details of internal representation in the databases. Though this is a useful and important software engineering technique, we will see later that the activities of domain integration cannot be done in complete independence from these internal details. However, in many cases, the tools that are provided in the domain integration toolkit may make these tasks less painstaking (cf. Section 2.4.1).

2.2.3. A Return to the Mediator Language

We use the domains discussed above to elaborate on the mediator language. The language is rule-based, with Prolog-like syntax. Access to the various domains integrated in HERMES is achieved through a small, but fairly general special set of predicates. These predicates take as input, various domain calls, whose syntax was introduced in the previous subsections.

(=) This is the usual binary equality relation that takes two arguments. It succeeds in the case that both of its arguments are identical. Note that either of the arguments may be a domain call, in which case their output will be compared. The = predicate may take complex types (e.g. records, arrays, array of records, etc.) as input.
(in) This is a binary predicate symbol that takes two arguments as input. The second argument is a domain call, whose output is assumed to be a set of values. The first argument may be either a variable, or is a constant. In the former case, the relation always succeeds, with the side-effect that the variable is substituted by one of the values from the domain call. In the latter case, the relation in holds just in the case that the constant represented by the first argument is one of the elements in the set returned by the domain call.
In the Paradox example, the query
succeeds if the parts database contains at least one object whose color is green.
(is) This is a binary predicate symbol that takes two arguments, both sets. The relation is succeeds if the first argument is either equal to, or can be instantiated to something that is equal to the second argument.
For instance, the atom
succeeds just in case the parts database contains only green objects.

The formal syntax of a mediator has now been explained. We refer to each relation constructed using one of =,in or is as a domain call atom. An annotation is a pair [M,T] where M is an expression representing a value between 0 and 1 inclusive, and T is an expression representing a set of non-negative real numbers. Given an atom A, the expression A:[M,T] is called an annotated atom, where A is the atomic part. A mediatory clause (or mediatory rule) is a statement of the form

A0:[M0,T0] <- A1 & ... & An

where each Ai, for i = 1,...,n, is either an annotated atom, or is a domain call atom. The first annotated atom A0:[M0,T0] is called the head of the rule, while the conjunction to the right of the symbol <- is called the body.

For those readers familiar with annotated logics, the annotations extend ordinary Prolog clauses with the reasoning capability over uncertainty and time. Informally, to assert an annotated atom A:[M,T] is to say intuitively that the relation A is true with certainty at least M at all time points in T. Thus, the annotated atom at("john","office"):[0.9,[0800,1700]] says that between the time points 0800-1700 hours, there is a certainty of over 90% that John is in his office. The reading of the mediatory clause above is then: Suppose for each annotated atom Ai:[Mi,Ti] in the body, Ai is true with certainty at least Mi at all time points in Ti. Suppose in addition that each domain call atom in the body holds. Then conclude that A0 is true with certainty at least M0 at all time points in T0. The formal semantics of these annotations has been studied in [9,15].

We remark that the language for expressing uncertainty in annotations contains as constants, the real numbers between 0 and 1. The language also contains variables and functions interpreted over the interval [0,1]. Likewise, the second component of each annotation is a term in a first order language for expressing temporal information. The constants in the language are sets of real numbers between 0 and 1. The intersection of the temporal language, the uncertainty language, and the logical language need not be empty. This way, a variable belonging to all three languages may be used to integrate temporal information with uncertainty information, logical information, and information gleaned from different databases.

Example 1. Suppose we are given data stored in a Paradox database called DB1, a DBASE database called DB2 and a spatial database called DB3. Suppose DB1 is a relation containing fields "qty" and "name", DB2 is a relation containing fields "name" and "location", and DB3 is a spatial data structure who nodes have a field "location", in which two subfields, "x" and "y", are specified. Using the following rule, the query query1(Supplier, Part, Quantity,Factory) retrieves any Supplier that lies within 50 units of distance from a given Factory, and the $Supplier has enough of the component Part to satisfy Factory's request for a given Quantity of the component.

query1(Supplier, Part, Quantity, Factory):[1,R] <-: in(Supplier,PARADOX:project(select>('db1',"qty",Quantity),"name"))&
in(Loc1,DBASE:project(select=('db2',"name",Supplier),"location))&
in(Loc2,DBASE:project(select=('db2',"name",Factory),"location))&
in(Loc1,SPATIAL:RANGE('db3',Loc2.x,Loc2.y,50)).

According to the rule, once the Part and Quantity being desired by a particular Factory are specified, any supplier is either satisfactory or unsatisfactory. However, in some cases, it may be desired to evaluate the ``goodness'' of how well a supplier matches the needs of the Factory based on the distance of the supplier from the Factory, and the available extra stock that the supplier has. The following modified rule accomplishes this through the annotation variables.

Example 2. Let query2 be defined similarly to query1, augmented with a goodness of fit between a supplier and the factory he supplies by evaluating the quantity of overstock the supplier has, and the distance from the factory of the supplier. The evaluation is performed by the annotation function EVAL.

query2(Supplier, Part, Quantity, Factory):[EVAL(Dist,Over),R] <-: in(Supplier,PARADOX:project(select>('db1',"qty",Quantity),["name","qty"])) &
in(Loc1,DBASE:project(select=('db2',"name",Supplier),"location")) &
in(Loc2,DBASE:project(select=('db2',"name",Factory),"location")) &
in(Dist,SPATIAL:DIST('db3',Loc2.x,Loc2.y,Loc1.x,Loc2.x)) &
=(Over,SQ.qty - Quantity).

Here, Dist and Over are both annotation variables, as well as variables in the logical language. The complex annotation term EVAL(Dist,Over) appearing in the annotation of the head of the clause is assume to return a value between 0 and 1.

We present a few more examples of domains that are currently integrated in HERMES.

2.2.4. The Domain of Text Databases

Text database systems can be used to index large amounts of text data. A text database may be regarded as a domain whose set of values consists of characters and words.

A simple example of a rules involving text database is as follows.

news(Word, Article):[1,R] <-: in(P,TEXTDB:headline('usatoday.idx', Word)) &
=(P.filename, Article).

This rule defines a predicate that accesses a text database called textdb through the headline function, which has been implemented in textdb. It takes two arguments: 'usatoday.idx' is the index file used to index a body of text data (actually, in our implementation, this indexes a body of data from on-line versions of the USA Today newspaper), and Word is the keyword on which to search. A query to this rule through which a user may find, for example, the name of the spouse of the person whose taxes are reported in USA today is:

<- news("taxes",Article) & news(Person,Article) & in(Spouse,PARADOX:project(select=('spouse',Spouse1,Person),Spouse).

This query assumes that the information on spouses is kept in a relational Paradox table called 'spouse'.

2.2.5. The Domain of Pictorial Databases

Pictorial databases are repositories of images. Suppose a module existed for querying these databases to determine features present in a picture. A general architecture for such queries and a formal theoretical framework for it has been studied by Marcus and Subrahmanian in 16. For instance, consider the predicate p(Person,Rank,Picture) defined below which succeeds just in case Picture is one of George Bush, with the spouse of a person whose tax dealings have been reported in the newspaper. This may be expressed as follows:

p(Person, Rank, Picture):[1,R] <-: in("George Bush",PICTUREDB:feature(File)) &
in(OtherPerson,PICTUREDB:feature(File)) &
in(Spouse,PARADOX:project(select=('spouse',Spouse1,OtherPerson), Spouse))&
news(Spouse,Article):[1,R] &
news("taxes",Article).

Note that the mediator author need not be concerned with how the pre-defined function called feature is implemented within the pictorial database -- a variety of implementation possibilities exist including by an image processing program, or a face recognition program, or it may have been created by annotating the pictorial data by a human. The above example has been implemented in HERMES using the last approach. In addition, we have incorporated a face recognition system developed at the Vision Lab of the University of Maryland (we are currently testing how well this face recognition algorithm works).

[ Top ] [ Previous Section ] [ Next Section ]

Click here to go back to the Hermes homepage .

Web Accessibility