I am interested in experimental parallel and distributed systems, and aspects of my work touch on the more general areas of systems, networking, and databases. I approach my work from different perspectives, namely runtime systems, programming models, and applications. The following sections give overview descriptions of my dissertation and the other projects I have worked on during my graduate study at the University of Maryland.
Summary of DissertationMy dissertation work involves data-intensive applications that must execute in a wide-area Grid environment. The term Grid refers to the range of current meta-computing work by many researchers to collectively and transparently make use of distributed resources by an application. My thesis is that an appropriate runtime system can leverage knowledge about an application to enable optimizations that transparently improve the execution of data-intensive applications. I have proposed, implemented, and evaluated a component-based programming model and runtime system that use this approach to improving the efficiency and execution time of several data-intensive applications.
ApplicationsThe data-intensive applications I have targeted, are generally recognized as important by the research community (including the Grid Forum), and exhibit some similar characteristics. First, they need to read and process very large datasets. Second, these datasets may and probably will reside at multiple remote locations. Third, the applications repeatedly process discrete pieces of work (e.g., queries), each in a somewhat regular data-flow style. Finally, the applications oftentimes exhibit predictable resource usage (memory, cpu, etc) while processing work. One such application is the Virtual Microscope system. A client program is used to interactively explore a large collection of high resolution digital microscopy slides, and operates by repeatedly issuing queries for image data. Processing of the data can include typical image processing functions such as format decoding, clipping, zooming, etc., and more specialized higher level operations such as 3D reconstruction or abnormal cell detection.
Component-based programming model semanticsWhen an application is designed for execution at a single host, the programming model is fairly simple and well understood. In contrast, distributed systems require an application to be broken into sub-units, and executed in a collective manner to perform its computation. Component and object based models have emerged as a promising choice to deal with the difficulties of designing applications for wide-area environments. They are natural for the type of computation typically performed, and lend to software reuse by combining components together using well-defined interfaces.
I have proposed the use of a component-based model, which leverages the targeted application characteristics to provide useful information to the runtime system. The model is called filter-stream programming, where components are referred to as filters, and communication between filters is over named uni-directional streams. The advantage of this model is derived from the well-defined semantics of filter execution, which are used to perform optimizations. Filters may be executed on any host, repeatedly process application-defined units of work, and may be restarted in another location at any unit-of-work boundary. Since communication is over named streams, as opposed to hostnames, the filter is largely location independent. An application makes use of filters by specifying a group of filters and their stream connectivity, which can then be used by sending work descriptions to them. A running filter may also be time-shared between concurrent filter groups. These model semantics are used in concert with the targeted application characteristics to enable various runtime optimizations:
Placement [HCW'2000]A filter group is annotated with information about the location of required disk-resident data and the primary data-flow pattern. I use this to optimize the choice of locations where components execute to avoid performance problems. For example, if an input dataset is located across a slow network, a component that reads a large amount of data and produces a much smaller output should be placed near its source instead of at opposite ends of the slow network.
Concurrent Instance SharingAn optimization is possible if one considers cpu and memory usage of filters. We assume that multiple filter groups are to execute concurrently, and they contain some subset of identical filters. If the placement decision settles on identical filters on a particular host, the idea is to share or multiplex one filter instance among several filter groups. The basic tradeoff is between the savings in resource usage because only a single filter is running vs. the increased latency caused by the time-sharing of a single filter. This technique can reduce the chance of swapping by sharing instances.
Parallel Copies [CCGRID'2001]In most component or object systems for a wide-area environment, when an application needs access, a single entity is created. In my thesis, I have also considered creating more than one instance when there are extra resources at a host. This depends on the filters being able to handle out of order receipt and delivery of data, which many of our targeted applications do. I have shown that for various mixes of computation and communication ratios, more than one instance will typically perform better, and there is no good static choice for all cases.
Other Research ProjectsEarlier in my research, I was involved in the parallelization and characterization of data-intensive applications. The goal of this project was to determine how to achieve good I/O performance for real-life data-intensive tasks on parallel machines. We implemented a user-level library that operated in client-server or peer-to-peer modes to perform parallel I/O, and used it with modified versions of various I/O-intensive applications. The results indicated that various restructured applications could get good performance from simple I/O APIs, by making larger requests, and being aware of data layout. This was joint work with Anurag Acharya, Mustafa Uysal, and others from our research group.
The Virtual Microscope [AMIA'98]I am part of the effort to build a software system that emulates a physical microscope. We collaborated with clinical pathologists at Johns Hopkins Hospital, and developed a client-server system that delivers high resolution microscopy images of patient pathology slides to a specialized Java client. The server structure involved dealing with very large image datasets containing 5-10GB slides. The slides are tiled to allow in-core query processing, and are not required to have uniform resolution over the entire slide. Applying performance improving techniques from our parallel I/O work, a parallel server was implemented that declusters data across nodes of an IBM SP. This has become an important data-intensive driving application for other projects in our group as well as my own research, and has recently been funded through the HUBS collaboration of pathology institutions. We have derived abstract client workload models based on the usage patterns of pathologists using the system. This was joint work with others in my research group: Chialin Chang, Renato Ferreira, Tahsin Kurc, and Alan Sussman.
Application Proxies [ICS'99]I worked on improving the scalability and performance of client-server data-intensive applications over wide-area connections. Motivating applications include the Virtual Microscope, and other applications our group have been working with from the same general class. First, I experimented with a transparent caching proxy server for the Virtual Microscope. This approach, as in web proxies, has the benefit that the server and client can remain unchanged. The resulting benefits depended greatly on the degree of temporal commonality between concurrent clients. I found that the although unchanged clients and servers are an understandable goal, this also places a hard upper limit on what could be done to improve performance. This realization led to my current thesis work, where the entire application from the client to the server is decomposed to enable various optimizations not possible with proxies alone.