The article, Science and The Semantic Web discusses some of the impacts on scientific use of the web that are enabled by Semantic Web technologies. In this demonstration, we concentrate on one important aspect of this work -- the dynamic composition of work-flows of web services, and how these can be applied to Grid services.
Service composition can be used in linking Web (and Semantic Web) concepts to services provided in other network-based environments. As an example, consider a network environment which includes two types of services; data-producing services and data processing services. An example of a data-producing service might be one associated with a sensor which is producing data, a processing one could be one which runs a FIR filter on that sensor data. At a more complex level, this could include data produced by a biological sampling or sensing device which would be processed by some sequencing or visualization software, or other such complex program. The concept of a "semantic grid," linking databases to processors on the computational grid (represented in the Open Grid Service Architecture) by use of Semantic Web information management techniques is a special case of this, and one we are developing tools for. We have already developed technique for composing other services including device services, written in the Universal Plug and Play language (UPnP) and in the web standard Web Service Description Language (WSDL). One of the unique features of our research is the ability to combine all these together by use of an extended version of the DAML-S language. We have already done full groundings for WSDL and UPnP and have a partial grounding for OGSA. This funded research will enable us to extend our WSDL tool set to OGSA, and to work on the further automation of service composition.
Our work is built on top of the DARPA Agent Markup Language Service ontologies (DAML-S). DAML-S partitions a semantic description of a web service into three components: the service profile, process model and grounding. The "ServiceProfile" describes what the service does by specifying the input and output types, preconditions and effects. The "Process Model" describes how the service works; each service is either an AtomicProcess that is executed directly or a CompositeProcess that is a combination of other subprocesses. The "Grounding" contains the details of how an agent can access a service by specifying a communications protocol, parameters to be used in the protocol and the serialization techniques to be employed for the communication. Work at UMCP has demonstrated the grounding of DAML-S in correct WSDL and in UPnP as discussed above.
The composer creates a workflow of services that can solve the user's need in a goal-driven way. The user starts the composition process by selecting one of the services registered to the composer, and specifying some input to that process. For example, the user could choose "FIR filter" as a service and provide the input "a sensor service" (meaning the system would be free to choose one) or some specific sensor. Similarly, the user could specify a particular visualizer or analysis device and a specific dataset, or could specify (using an ontology) any data set meeting certain characteristics. The system uses a filtering technique based on the "non-functional attributes" of the service -- that is, the ontological properties that are note directly inputs and outputs to the service. In the case of a sensor these would be features such as sensor location, type, deployment date, sensitivity, etc. The system is also extensible, which we believe will be an extremely important functionality for use by scientists. Any composition generated by the user and the system can be automatically realized as a DAML-S CompositeProcess, thus allowing it reused at a later time, or used by the system for composition with other services.
We are currently extending our tools directly to work with Grid computing particularly focusing on the Open Grid Services Architecture (OGSA). OGSA is an extended version of WSDL, which defines specific "service description elements" and "ports" making the services available on the grid. We are developing a version of DAML-S that can directly reason about the properties of OGSA services and which can extend the OGSA capabiltiies into new choreographies, managed by the composer. This would allow the service composer to generate a plan of how to achieve a goal and produce a workflow. This workflow would then be handed to an analysis system to examine it for cost, quality of service, efficiency, resource use, etc.
We are also starting a more research-based activity into service composition management and execution monitoring. When the workflow developed above is run against a set of scientific data, many problems can arise - these can be computational problems (a needed resource is unavailable, perhaps a crashed server or attempt to access a device without proper authority) or problems relating to the scientific process itself (lack of provenance on the data, datarange issues, unexpected dataset noise, etc.) These latter problems also include issues of later reuse of this data -- for an extreme example, consider those papers based on the Lucent Laboratory materials reports that are now considered to be fraudulent. Many results based on that data will now need to be reexplored. The service composition and monitoring system could track the derivation and sourcing of data to help with these problems, as well as with more mundane scientific problems such as date of publication, authorship, etc.