next up previous
Next: PetaSIM - A Performance Up: The Performance Prediction Process Previous: Target Machine Specification

HLAM - Hierarchical High-Level Application Modeling Framework

 

Critical to our project will be a hierarchical high level application modeling framework (HLAM), which is based on existing Rutgers/UCSB research but will be substantially generalized in this project, as well as being integrated into a powerful simulation system. We believe there are several advantages in using simplified application abstractions rather than the full application. First, it is expensive to simulate full applications - especially large problems on future high performance (PetaFlop) systems. Second, use of a well chosen abstraction can lead to better understanding of the key features that determine performance. Finally, abstractions can be generic and easily represent a broader class of applications than any one full application. This is not to say detailed simulations of full applications are not of value - rather we argue that our high-level abstractions have intrinsic complementary value. HLAM is illustrated in Figure 2, which shows that we first hierarchically divide an application (sometimes called in this context a meta-problem) into modules. A sophisticated parallel application may be composed of several coupled parallel programs. Parallel programs can themselves be viewed as being composed of a collection of parallel modules. These modules may be explicitly defined by a user, or the modules may be generated by a semi-automatic process, such as an HPF compiler. Modules that represent distinct programs may execute on separate nodes of a networked meta-computer. An individual module may be sequential or data parallel. We might use a data parallel module to represent a multi-threaded task that runs on a multiprocessor node. HLAM will include a wide range of applications, including data-intensive applications (including I/O from remote sites) and migratory Web programs. Some key features of the proposed HLAM are:

  1. A hierarchical graph representation.
  2. Symbolic specification of problems, so that the system can be used to test arbitrarily large problem instances.
  3. Use of aggregates as building blocks for specifying application modules in a hierarchical/multi-level manner that includes both task and data parallelism.
  4. Aggregates are chosen as the largest possible unit of data parallelism that can specify the problem at the level needed to model performance with the required precision and grain size.
  5. Model various types of data interaction (loose synchronization and other dependencies such as pipelining) between program components. Explicit support of the loosely synchronous structure present in essentially all large scale data parallel applications.
  6. Modeling of dynamic relationships between program components for runtime adaptive prediction/optimization.

One initial focus will be to develop a hierarchical behavioral representation for MPI-based SPMD parallel code. Previous research on representing parallelism in sequential code will benefit our project. For example, control and data dependence can be modeled through program dependence graphs [22]. Hierarchical task graphs [33] were developed for modeling functional parallelism and loop parallelism. Such a graphical structure has also been studied in the SISAL project for functional parallelism [20]. For modeling MPI-based parallel programs, we will abstract not only hierarchical control structures, but also important multiprocessing events such as message sends and receives, reduction operations, global communication and barrier synchronizations. Thus the graphical structure of an MPI program will consist of basic computation components, communication and I/O primitives, and multi-level control over these components. A basic computation is a code segment that does not involve I/O and interprocessor communication. Basic computation blocks are modeled at a coarse-grain level if possible so that the performance impact of the multi-level memory hierarchy can be studied at a block level. Computation primitives from software building blocks such as the BLAS and LAPACK math subroutines can be used to abstract basic computations.


next up previous
Next: PetaSIM - A Performance Up: The Performance Prediction Process Previous: Target Machine Specification

Wes Stevens
Fri Jul 11 15:07:44 EDT 1997