Maryland Applications for Measurement and Benchmarking of I/O on Parallel Computers


Motivation

Until recently, most applications developed for parallel machines avoided I/O as much as possible (distributed databases have been a notable exception). Typical parallel applications (usually scientific programs) would perform I/O only at the beginning and the end of execution with the possible exception of infrequent checkpoints. This has been changing: I/O-intensive parallel programs have emerged as one of the leading consumers of cycles on parallel machines. This change has been driven by two trends. First, parallel scientific applications are being used to process larger datasets that do not fit in memory. Second, a large number of parallel machines are being used for non-scientific applications, for example databases, data mining, web servers for busy web sites (e.g. Altavista and NCSA). Characterization of these I/O intensive applications is an important problem that has tremendous effect on the design of I/O subsystems, operating systems and filesystems.

To this end, we have traced seven parallel I/O-intensive applications. These applications were run on eight nodes of an IBM SP-2. We used the AIX trace utility to trace I/O-related system calls (open, close, read, write and seek). We also captured all message-passing activity and context-switches. This allowed us to accurately compute the inter-arrival times for I/O requests and to better understand the application behavior. Some characteristics of these traces have been described in University of Maryland Technical Report:

Mustafa Uysal, Anurag Acharya, Joel Saltz. Requirements of I/O Systems for Parallel Machines: An Application-driven Study. Technical Report, CS-TR-3802, University of Maryland, College Park, May 1997.

We are making these traces available for the use of other researchers. The traces are in ASCII. We provide a description of the trace format; utility programs to convert to/from a binary format; and library routines to access the trace records in binary format. For each of the applications, we provide a brief description of the application itself, the input dataset and the workload.

Non-scientific applications

Scientific applications

Utilities

These files describe the trace formats and provides small utilities to deal with the trace files, such as converting to/from binary, a library of routines to manipulate trace records, etc.

People

----------------------------------------------------------------------
Last updated on Tue May 27 12:37:44 EDT 1997 by Mustafa Uysal ( uysal@cs.umd.edu ).