You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format. However, this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the Department of Computer Science of the University of Maryland at College Park under terms that include this permission. All other rights are reserved by the author(s).
DataCutter and A Client Interface for the Storage Resource Broker with. Tahsin Kurc. Michael Beynon. Alan Sussman. Joel Saltz. May 2000.
The continuing increase in the capabilities of high performance computers and continued decreases in the cost of secondary and tertiary storage systems is making it increasingly feasible to generate and archive very large (e.g. petabyte and larger) datasets. Applications are also increasingly likely to make use of archived data obtained by different types of sensors. Such sensors include imaging devices deployed on satellites and aircraft, microscopy related imagery and radiology related imagery. Simulation or sensor datasets generated or acquired by one group may need to be accessed over a wide-area network by other groups. Datasets frequently describe data associated with collections of very large structured or unstructured grids where each grid point is associated with several variables. Applications frequently need only to obtain portions of a dataset. Required data may correspond to a particular region in a multidimensional space. The application may need to access all data associated in a multidimensional region or it may need only certain variable values at a subsampled set of spatial locations. In addition, in some cases, applications may require data products obtained by aggregating data in one way or another. For instance, a user might require time or space averaged data. This document describes the design of a middleware infrastructure, called DataCutter, that enables subsetting and user-defined filtering of multi-dimensional datasets stored in archival storage systems across a wide-area network. We also describe a client API for Storage Resource Broker (SRB) clients, which allows SRB clients to carry out subsetting and filtering of datasets stored through the SRB. This API uses a prototype implementation of the DataCutter indexing and filtering services. (Also cross-referenced as UMIACS-TR-2000-26) University of Maryland Institute for Advamced Computer Studies, Department of Computer Science, University of Maryland,
Design of a Framework for Data-Intensive Wide-Area Applications. Michael D. Beynon. Tahsin Kurc. Alan Sussman. Joel Saltz. February 2000.
Applications that use collections of very large, distributed datasets have become an increasingly important part of science and engineering. With high performance wide-area networks becoming more pervasive, there is interest in making collective use of distributed computational and data resources. Recent work has converged to the notion of the Grid, which attempts to uniformly present a heterogeneous collection of distributed resources. Current Grid research covers many areas from low level infrastructure issues to high level application concerns. However, providing support for efficient exploration and processing of very large scientific datasets stored in distributed archival storage systems remains a challenging research issue. We have initiated an effort that focuses on developing efficient data-intensive applications in a Grid environment. In this paper, we present a framework, called filter-stream programming, that represents the processing units of a data-intensive application as a set of filters, which are designed to be efficient in their use of memory and scratch space. We describe a prototype infrastructure that supports execution of applications using the proposed framework. We present the implementation of two applications using the filter-stream programming framework, and discuss experimental results demonstrating the effects of heterogeneous resources on application performance. (Also cross-referenced as UMIACS-TR-2000-04) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Optimizing Retrieval and Processing of Multi-dimensional Scientific. Chialin Chang. Tahsin Kurc. Alan Sussman. Joel Saltz. February 2000.
Exploring and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We have been developing the Active Data Repository (ADR), an infrastructure that integrates storage, retrieval, and processing of large multi-dimensional scientific datasets on distributed memory parallel machines with multiple disks attached to each node. In earlier work, we proposed three strategies for processing range queries within the ADR framework. Our experimental results show that the relative performance of the strategies changes under varying application characteristics and machine configurations. In this work we investigate approaches to guide and automate the selection of the best strategy for a given application and machine configuration. We describe analytical models to predict the relative performance of the strategies when input data elements are uniformly distributed in the attribute space of the output dataset, restricting the output dataset to be a regular $d$-dimensional array. We present an experimental evaluation of these models for various synthetic datasets and for several driving applications on a 128-node IBM SP. (Also cross-referenced as UMIACS-TR-2000-03) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Querying Very Large Multi-dimensional Datasets in ADR - Extended. Tahsin Kurc. Chialin Chang. Renato Ferreira. Alan Sussman. Joel Saltz. May 1999.
This paper addresses optimizing the execution of range queries into multi-dimensional datasets on distributed memory parallel machines within the Active Data Repository framework. ADR is an infrastructure that integrates storage, retrieval and processing of large multi-dimensional datasets on distributed memory parallel architectures with multiple disks attached to each node. We describe three potential strategies for efficient execution of such queries that employ different tiling and workload partitioning approaches. We evaluate scalability of these strategies for different application scenarios, varying both the number of processors and the input dataset size on a 128 processor IBM SP multicomputer. Also cross-referenced as UMIACS-TR-99-29 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Query Planning for Range Queries with User-defined Aggregation on. Chialin Chang. Tahsin Kurc. Alan Sussman. Joel Saltz. February 1999.
Applications that make use of very large scientific datasets have become an increasingly important subset of scientific applications. In these applications, the datasets are often multi-dimensional, i.e., data items are associated with points in a multi-dimensional attribute space. The processing is usually highly stylized, with the basic processing steps consisting of (1) retrieval of a subset of all available data in the input dataset via a range query, (2) projection of each input data item to one or more output data items, and (3) some form of aggregation of all the input data items that project to the each output data item. We have developed an infrastructure, called the Active Data Repository (ADR), that integrates storage, retrieval and processing of multi-dimensional datasets on shared-nothing architectures. In this paper we address query planning and execution strategies for range queries with user-defined processing. We evaluate three potential query planning strategies within the ADR framework under several application scenarios, and present experimental results on the performance of the strategies on a multiprocessor IBM SP2. (Also cross-refereced as UMIACS-TR-99-15) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Performance Impact of Proxies in Data Intensive Client-Server Parallel. Michael D. Beynon. Alan Sussman. Joel Saltz. November 1998.
Large client-server data intensive applications can place high demands on system and network resources. This is especially true when the connection between the client and server spans a wide-area internet link. In this paper, we consider changing the typical client-server architecture of a class of data intensive applications. We show that given sufficient common interest among multiple clients, our enhancements reduce the response time per-client and reduce the amount of data sent across the wide-area link. In addition, we also see a reduction in server utilization which helps to improve server scalability as more clients are added to the system. (Also cross-referenced as UMIACS-TR-98-70) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Performance Prediction Framework for Data Intensive Applications on. Mustafa Uysal. Tahsin M. Kurc. Alan Sussman. Joel Saltz. July 1998.
This paper presents a simulation-based performance prediction framework for large scale data-intensive applications on large scale machines. Our framework consists of two components: application emulators and a suite of simulators. Application emulators provide a parameterized model of data access and computation patterns of the applications and enable changing of critical application components (input data partitioning, data declustering, processing structure, etc.) easily and flexibly. Our suite of simulators model the I/O and communication subsystems with good accuracy and execute quickly on a high-performance workstation to allow performance prediction of large scale parallel machine configurations. The key to efficient simulation of very large scale configurations is a technique called loosely-coupled simulation where the processing structure of the application is embedded in the simulator, while preserving data dependencies and data distributions. We evaluate our performance prediction tool using a set of three data-intensive applications. (Also cross-referenced as UMIACS TR # 98-39) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Infrastructure for Building Parallel Database Systems for. Chialin Chang. Alan Sussman. Joel Saltz. April 1998.
As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Our study of a large set of scientific applications over the past three years indicates that the processing for such datasets is often highly stylized and shares several important characteristics. Usually, both the input dataset as well as the result being computed have underlying multi-dimensional grids. The basic processing step usually consists of transforming individual input items, mapping the transformed items to the output grid and computing output items by aggregating, in some way, all the transformed input items mapped to the corresponding grid point. In this paper, we present the design of T2, a customizable parallel database that integrates storage, retrieval and processing of multi-dimensional datasets. T2 provides support for common operations including index generation, data retrieval, memory management, scheduling of processing across a parallel machine and user interaction. It achieves its primary advantage from the ability to seamlessly integrate data retrieval and processing for a wide variety of applications and from the ability to maintain and jointly process multiple datasets with different underlying grids. We also present some preliminary performance results comparing the implementation of a remote-sensing image database using the T2 services with a custom-built integrated implementation. (Also cross-referenced as UMIACS-TR-98-24) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
T2: A Customizable Parallel Database For Multi-dimensional Data. Chialin Chang. Anurag Acharya. Alan Sussman. Joel Saltz. January 1998.
As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Several database research groups and vendors have developed object-relational database systems to provide some support for managing and/or visualizing multi-dimensional datasets. These systems, however, provide little or no support for analyzing or processing these datasets -- the assumption is that this is too application-specific to warrant common support. As a result, applications that process these datasets are analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Several database research groups and vendors have developed object-relational database systems to provide some support for managing and/or visualizing multi-dimensional datasets. These systems, however, provide little or no support for analyzing or processing these datasets -- the assumption is that this is too application-specific to warrant common support. As a result, applications that process these datasets are usually decoupled from data storage and management, resulting in inefficiency due to copying and loss of locality. Furthermore, every application developer has to implement complex support for managing and scheduling the processing. Our study of a large set of scientific applications over the past three years indicates that the processing for such datasets is often highly stylized and shares several important characteristics. Usually, both the input dataset as well as the result being computed have underlying multi-dimensional grids. The basic processing step usually consists of transforming individual input items, mapping the transformed items to the output grid and computing output items by aggregating, in some way, all the transformed input items mapped to the corresponding grid point. In this paper, we present the design of T2, a customizable parallel database that integrates storage, retrieval and processing of multi-dimensional datasets. T2 provides support for common operations including index generation, data retrieval, memory management, scheduling of processing across a parallel machine and user interaction. It achieves its primary advantage from the ability to seamlessly integrate data retrieval and processing for a wide variety of applications and from the ability to maintain and jointly process multiple datasets with different underlying grids. (Also cross-referenced as UMIACS-TR-98-04) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
The Virtual Microscope. Renato Ferreira. Bongki Moon. Jim Humphries. Alan Sussman. Joel Saltz. Robert Miller. Angelo Demarzo. April 1997.
We present the design of the Virtual Microscope, a software system employing a client/server architecture to provide a realistic emulation of a high power light microscope. We discuss several technical challenges related to providing the performance necessary to achieve rapid response time, mainly in dealing with the enormous amounts of data (tens to hundreds of gigabytes per slide) that must be retrieved from secondary storage and processed. To effectively implement the data server, the system design relies on the computational power and high I/O throughput available from an appropriately configured parallel computer. (Also cross-referenced as UMIACS-TR-97-35) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Titan A High-Performance Remote-Sensing Database. Chialin Chang. Bongki Moon. Anurag Acharya. Carter Shock. Alan Sussman. Joel Saltz. August 1996.
There are two major challenges for a high-performance remote-sensing database. First, it must provide low-latency retrieval of very large volumes of spatio-temporal data. This requires effective declustering and placement of a multi-dimensional dataset onto a large disk farm. Second, the order of magnitude reduction in data-size due to post-processing makes it imperative, from a performance perspective, that the postprocessing be done on the machine that holds the data. This requires careful coordination of computation and data retrieval. This paper describes the design, implementation and evaluation of {\em Titan}, a parallel shared-nothing database designed for handling remote-sensing data. The computational platform for Titan is a 16-processor IBM SP-2 with four fast disks attached to each processor. Titan is currently operational and contains about 24~GB of data from the Advanced Very High Resolution Radiometer (AVHRR) on the NOAA-7 satellite. The experimental results show that Titan provides good performance for global queries, and interactive response times for local queries. (Also cross-referenced as UMIACS-TR-96-67) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Interoperability of Data Parallel Runtime Libraries with Meta-Chaos. Guy Edjlali. Alan Sussman. Joel Saltz. May 1996.
This paper describes a framework for providing the ability to use multiple specialized data parallel libraries and/or languages within a single application. The ability to use multiple libraries is required in many application areas, such as multidisciplinary complex physical simulations and remote sensing image database applications. An application can consist of one program or multiple programs that use different libraries to parallelize operations on distributed data structures. The framework is embodied in a runtime library called Meta-Chaos that has been used to exchange data between data parallel programs written using High Performance Fortran, the Chaos and Multiblock Parti libraries developed at Maryland for handling various types of unstructured problems, and the runtime library for pC++, a data parallel version of C++ from Indiana University. Experimental results show that Meta-Chaos is able to move data between libraries efficiently, and that Meta-Chaos provides effective support for complex applications. (Also cross-referenced as UMIACS-TR-96-30) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Runtime Coupling of Data-parallel Programs. M. Ranganathan. Anurag Acharya. Guy Edjlali. Alan Sussman. Joel Saltz. November 1995.
We consider the problem of efficiently coupling multiple data-parallel programs at runtime. We propose an approach that establishes a mapping between data structures in different data-parallel programs and implements a user specified consistency model. Mappings are established at runtime and new mappings between programs can be added and deleted while the programs are in execution. Mappings, or the identity of the processors involved, do not have to be known at compile-time or even link-time. Programs can be made to interact with different granularities of interaction without requiring any re-coding. A priori knowledge of data movement requirements allows for buffering of data and overlap of computations between coupled applications. Efficient data movement is achieved by pre-computing an optimized schedule. We describe our prototype implementation and evaluate its performance for a set of synthetic benchmarks that examine the variation of performance with coupling parameters. We demonstrate that the cost of the added flexibility gained by our coupling method is not prohibitive when compared with a monolithic code that does the same computation. (Also cross-referenced as UMIACS-TR-95-116) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Compiler and Runtime Support for Programming in Adaptive. Guy Edjlali. Gagan Agrawal. Alan Sussman. Jim Humphries. Joel Saltz. July 1995.
For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at runtime. In this paper, we discuss runtime support for data parallel programming in such an adaptive environment. Executing programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a runtime library to provide this support. We discuss how the runtime library can be used by compilers of HPF-like languages to generate code for an adaptive environment. We present performance results for a Navier-Stokes solver and a multigrid template run on a network of workstations and an IBM SP-2. Our experiments show that if the number of processors is not varied frequently, the cost of data redistribution is not significant compared to the time required for the actual computation. Overall, our work establishes the feasibility of compiling HPF for a network of non-dedicated workstations, which are likely to be an important resource for parallel programming in the future. (Also cross-referenced as UMIACS-TR-95-83) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Framework for Optimizing Parallel I/O. Robert Bennett. Kelvin S. Bryant. Alan Sussman. Raja Das. Joel Saltz. January 1996.
There has been a great deal of recent interest in parallel I/O. This paper discusses issues in the design and implementation of a portable I/O library designed to optimize the performance of multiprocessor architectures that include multiple disks or disk arrays. The major emphasis of the paper is on optimizations that are made possible by the use of collective I/O, so that I/O requests for multiple processors can be combined to improve performance. Performance measurements from benchmarking our implementation of an I/O library that currently performs collective local optimizations, called Jovian, on three application templates are also presented. Dept. of Computer Science, Univ. of Maryland,
Chialin Chang. Alan Sussman. Joel Saltz. Support for Distributed Dynamic Data Structures in C++. February 1995.
Traditionally, applications executed on distributed memory architectures in single-program multiple-data (SPMD) mode use distributed (multi-dimensional) data arrays. Good performance has been achieved by applying runtime techniques to such applications executing in a loosely synchronous manner. However, many applications utilize language constructs such as pointers to synthesize dynamic complex data structures, such as linked lists, trees and graphs, with elements consisting of complex composite data types. Existing runtime systems that rely on global indices cannot be used for these applications, as no global names or indices are imposed upon the elements of these data structures. A portable object-oriented runtime library is presented to support applications that use dynamic distributed data structures, including both arrays and pointer-based data structures. In particular, CHAOS++ deals with complex data types and pointer-based data structures by providing {\em mobile objects} and {\em globally addressable objects}. Preprocessing techniques are used to analyze communication patterns, and data exchange primitives are provided to carry out efficient data transfer. Performance results for applications taken from three distinct classes are also included to demonstrate the wide applicability of the runtime library. (Also cross-referenced as UMIACS-TR-95-19) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Guy Edjlali. Gagan Agrawal. Alan Sussman. Joel Saltz. Data Parallel Programming in an Adaptive Environment. September 1994.
For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at runtime. In this paper, we discuss runtime support for data parallel programming in such an adaptive environment. Executing data parallel programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a runtime library to provide this support. We discuss how the runtime library can be used by compilers to generate code for an adaptive environment. We also present performance results for a multiblock Navier-Stokes solver run on a network of workstations using PVM for message passing. Our experiments show that if the number of processors is not varied frequently, the cost of data redistribution is not significant compared to the time required for the actual computations. (Also cross-referenced as UMIACS-TR-94-109) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Last Generated Fri Aug 11 04:01:01 EDT 2000