High Performance Software Laboratory CHAOS: Tools

Click here to remove the abstracts

PARTI Primitives for Unstructured and Block Structured Problems

Published in Computing Systems in Engineering in 1992.

Alan Sussman, Joel Saltz, Raja Das, S. Gupta. Dimitri Mavriplis and Ravi Ponnusamy

This paper describes a set of primitives (PARTI) developed to efficiently execute unstructured and block structured problems on distributed memory parallel machines. We present experimental data from a 3-D unstructured Euler solver run on the Intel Touchstone Delta to demonstrate the usefulness of out methods.

Applying the CHAOS/PARTI Library to Irregular Problems in Computational Chemistry and Computational Aerodynamics

Published in Scalable Parallel Libraries Conference, Mississippi State University in 1993.

Raja Das, Yuan-Shin Hwang, Mustafa Uysal, Joel Saltz, Alan Sussman

This paper describes a number of optimizations that can be used to support the efficient execution of irregular problems on distributed memory parallel machines. We describe software primitives that (1) coordinate interprocessor data movement, (2) manage the storage of, and access to, copies of off-processor data, (3) minimize interprocessor communication requirements and (4) support a shared name space. The performance of the primitives is characterized by examination of kernels from real applications and from a full implementation of a large unstructured adaptive application (the molecular dynamics code CHARMM).

Runtime Support and Compilation Methods for User-Specified Data Distributions

Submitted to IEEE Transactions on Parallel and Distributed Systems in Nov 1993.

Ravi Ponnusamy, Joel Saltz, Alok Choudhary, Yuan-Shin Hwang, Geoffrey Fox

This paper describes two new ideas by which an HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of compiler directives. The directives allow use of program arrays to describe graph connectivity, spatial location of array elements and computational load. The second mechanism is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed information from inspectors (e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures

Submitted to Journal of Parallel and Distributed Computing in Oct '93.

Raja Das, Mustafa Uysal, Joel Saltz, and Yuan-Shin Hwang

This paper describes a number of optimizations that can be used to support the efficient execution of irregular problems on distributed memory parallel machines. We describe software primitives that (1) coordinate interprocessor data movement, (2) manage the storage of, and access to, copies of off-processor data, (3) minimize interprocessor communication requirements and (4) support a shared name space. We present a detailed performance and scalability analysis of the communication primitives. This performance and scalability analysis is carried out using a workload generator, kernels from real applications and a large unstructured adaptive application (the molecular dynamics code CHARMM).

Supporting Irregular Distributions in FORTRAN 90D/HPF Compilers

Submitted to IEEE Parrallel and Distributed Technology.

Ravi Ponnusamy, Yuan-Shin Hwang, Joel Saltz, Alok Choudhary, and Geoffrey Fox

We present methods that make it possible to efficiently support an important subclass of irregular problems using data parallel languages. The approach we describe involves the use of a portable, compiler-independent, runtime support library called CHAOS. The CHAOS runtime support library contains procedures that support static and dynamic distributed array partitioning, partition loop iterations and indirection arrays, remap arrays from one distribution to another, and carry out index translation, buffer allocation and communication schedule generation.

The CHAOS runtime procedures are used by a prototype Fortran 90D compiler as runtime support for irregular problems. We present performance results of compiler-generated and hand-parallelized versions of two stripped down applications codes. The first code is derived from an unstructured mesh computational fluid dynamics flow solver and the second is derived from the molecular dynamics code CHARMM.

A method is described that makes it possible to emulate irregular distributions in HPF by reordering elements of data arrays and renumbering indirection arrays. We present results that suggest that an HPF compiler could use reordering and renumbering extrinsic functions to obtain performance comparable to that achieved by a compiler for a language (such as Fortran 90D) that directly supports irregular distributions.

Runtime Support and Dynamic Load Balancing Strategies for Structured Adaptive Applications

To appear in: Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing in Feb 1995.

Bongki Moon, Gopal Patnaik, Robert Bennett, David Fyfe, Alan Sussman, Craig Douglas, Joel Saltz, K. Kailasanath

One class of scientific and engineering applications involves structured meshes. One example of a code in this class is a flame modelling code developed at the Naval Research LAboratory (NRL). The numerical model used in the NRL flame code is predominantly based on structured finite volume methods. The chemistry process of the reactive flow is modeled by a system of ordinary differential equations which is solved independantly at each grid point. Thus, though the model uses a mesh structure, the workload at each grid point can vary considerably. It is this feature that requires the use of both structured and unstructured methods in the same code. We have applied the Multiblock PARTI and CHAOS runtime support libraries to parallelize the NRL flame code with minimal changes to the sequential code. We have also developed parallel algorithms to carry out dynamic load balancing. It has been observed that the overall performance scales reasonably up to 256 Paragon processors and that the total runtime on a 256-node Paragon is about half that of a single processor Cray C90.

This goes back to the CHAOS home page

last updated 5/23/95 -
Questions about the system or webserver: webmaster@cs.umd.edu
Problems with publications homepage: wes@cs.umd.edu