Target Machine Specification

Next: HLAM - Hierarchical High-Level Up: The Performance Prediction Process Previous: The Performance Prediction Process

Target Machine Specification

There is a close relationship between performance modeling and the description of applications needed to allow either the user or compiler to properly map an application onto parallel systems. In general, reliable performance estimates need the same level of machine description as is needed to specify parallel programs in a way that allows good performance to be obtained when executing on the target machine. This machine description can either be explicit (as in MPI [57]) or implicit as in an automatic parallelizing compiler which must essentially use such a machine description to define its internal optimizations of data placement and movement. Therefore to be effective in estimating performance on a target machine, PetaSIM must take as input an architectural description at the same level needed by parallel programming environments. The PetaSoft meetings identified the need for such architectural descriptions as essential in defining future extensions to parallel languages whether they be message or data parallel. MPI and HPF [42] implicitly treat parallel systems as a three level memory hierarchy (local processor memory, remote memory and disk). This model is inadequate for some current and nearly all future expected high performance systems. Thus an important product of our project will be such a machine description that targets both today's (distributed shared memory) machines and future designs typified by those examined in the PetaFlop process. That process looked at extrapolated conventional, superconducting and Processor in Memory (PIM) designs, and our proposed specification is appropriate for these three alternatives [23]. As was discussed earlier, this machine description will be helpful in developing future parallel programming environments. We expect experiences from our project to drive new developments in this field, as we will determine which features of application and machine are performance critical and will use (reliable) models of expected complex memory hierarchies, not waiting for new hardware to become available.

Our proposed machine description in HLAM will allow specification of the number of levels in the memory hierarchy, their sizes and data movement (latency and bandwidth) times. These primitive machine operations will include collective, as well primitive, operations and cover both data movement and data replication (as in messaging and cache operation).

Next: HLAM - Hierarchical High-Level Up: The Performance Prediction Process Previous: The Performance Prediction Process

Wes Stevens
Fri Jul 11 15:07:44 EDT 1997