PARKA-DB: A Scalable Knowledge Representation System

Database PARKA

Dr. James Hendler, Dr. Kilian Stoffel, Merwyn Taylor,
David Rager, and Dr. Brian Kettler

Parallel Understanding Systems Group
Computer Science Department, University of Maryland at College Park



Traditionally, knowledge representation systems have been implemented using main memory as the primary run-time storage medium. As a result, the size of a computer's internal memory has placed an upper bound on the size of the knowledge bases that could be loaded. Database management systems, on the other hand, use external storage (disk) as the primary run-time storage medium. In contrast to knowledge representation systems, the upper bound on the size of a database is not limited by the size of main memory, but rather by the size of external storage. This allows databases to grow to nearly unbounded size given the cost-space trade-offs of disk vs. main memory.

Knowledge representation systems, however, have strengths unavailable to normal database systems. They allow a complex structural representation of the data (knowledge) that allows inferencing and complex query evaluation to be performed. In addition, relations between data can be recorded as ``abstract'' relations, allowing knowledge about functional relationships, relations between groups of entities, and other such higher-level connections between the ``raw'' data elements. The PARKA-DB project is an effort to maintain and exploit these strengths of KR's, while recovering the advantages of of DBMS's, in particular the efficient storage and retrieval of massive amounts of data.

PARKA-DB is a project designed to develop a high performance knowledge representation system that deviates from the norm by using a database-management system to provide run-time storage advantages. As a KR system, PARKA-DB supports the same representational functionality as its memory-based predecessor. However, PARKA-DB goes one step further. It supports larger knowledge bases while consuming less internal memory. In fact, PARKA-DB can support knowledge bases that are as large as the available external storage. Further, since PARKA-DB claims less internal memory, it is a cost effective alternative to its predecessor since it primarily relies on the cheaper disk memory, via a DBMS, while consuming less of the more expensive internal memory. PARKA-DB is projected to be a useful KR technology for computer systems limited by main memory, running on todays smaller personal computers and individual workstations. (As a contrast, the fastest versions of Parka run on supercomputers, and take advantage of the massive amounts of memory available on these large parallel machines - see PARKA publications).

As a functional knowledge representation system, PARKA-DB is designed to support several applications including CaPER, a memory-intensive case-based reasoning system, ForMAT, a case-based logistics planning system developed by Mitre Corporation, and a set of medical informatics programs being developed at the University of Maryland (by Merwyn Taylor). One particular application class which we are now exploring is the merging of information from scientific and medical databases with associated domain knowledge, what we call a hybrid KB/DB.

Through the use of a hybrid KB/DB, the user can issue various complex and interesting queries against a database in a manner much simpler than that required to issue queries of the same meaning using a stand-alone DBMS. In particular, the complex inferencing capabilities of PARKA-DB can be utilized to formulate interesting and potentially useful conjunctive queries. In addition, data mining, the search for knowledge in databases, is another application that PARKA-DB is quite suitable for. Since data mining requires the verification of hypotheses against the data, PARKA-DB query facilities can be used to verify the strengths of hypotheses. Furthermore, by merging a data- and knowledge-based technologies, the data-mining system can find relevant relationships in the DB using taxonomical and other semantic knowledge present in the KB.

We are currently working on porting PARKA-DB to parallel architectures. The parallel implementations will provide suprerior performance while maintaining the ability to manage nearly arbitrarily large knowledge bases. Parallel implementations are planned for the SP2, T3D, Paragon, CM5, and for a distributed processing system currently being developed at the Hebrew University in Jerusalem.

Performance

The PARKA-DB system, cannot perform as well as the earlier, memory-based PARKA system, since it makes use of slower secondary storage. Current research focuses on testing the system, identifying bottlenecks to faster implementation, and working on optimization techniques to remove these bottlenecks. In addition, future work will concententrate on parallelization and distribution of this version, based on our previous results on the parallelization of the in-memory version.

To test the system loading time we compared it with the previous version of PARKA (we expected PARKA-DB to be faster, since (i) a large portion of the KB stays in secondary storage and (ii) the in-memory version requires the building of a number of data-structures which are now explicitely created by the DBMS and stored efficiently). We tested the system on three case bases used to test the CaPER memory-intensive case-based planning system. (See the PARKA KB page)

KBLoading TimeFramesStructural LinksAssertions
20 .241 11103 26404 131817
100 1.288 56516 130526 872743
200 2.714 226297 266100 1768832

The table above shows the sizes and load times for the three case bases. The structural links cover all ``isa'', ``instance'', ``instanceOf'', and ``subCat'' assertions (i.e. the class ontology). The assertions column counts all other types of assertions. Times to load these knowledge bases were about 300 times as fast as in the in-memory version.

To test the decrement in query times, we evaluated three queries previously created to test CaPER using the UM-TRANSLOG planning domain. The first query, which contains 10 conjuncts concerning 5 variables, corresponds to the query "Show all plans in which a train was used to move a package through Hub-TS", the second query, with 6 variables and 11 conjuncts, asks "Show all the instances of plan steps in which a regular-truck was used as the vehicle to move a package," and the final query, with 8 variables and 15 conjuncts asks to "show all plans in which the package was a liquid package, was moved in a tanker truck, and was taken to a location that is of type 'city-location'." The graphical form of these queries (used as inputs to our system) are shown below.

The following table shows the results of the test. The values in the columns represent the pure CPU time and the CPU + IO time in seconds respectively. While these times are significantly faster than any other system we know of (except our own previous version), we believe they show the need for better optimization in our algorithms.

Query20 Cases100 Cases200 Cases
1 .153 .342 3.417 4.203 12.462 14.070
2 .082 .275 2.820 3.85 11.691 13.871
3 2.812 3.346 26.691 29.049 61.801 66.287

Query 1Query 2Query 3


Return to PARKA page