next up previous
Next: Emulating Families of High Up: Estimation of Performance Cost Previous: Analytic Estimation and Compiler

Use of Hardware Modeling and Simulation

For a large-scale tera/petaflop-level application, we do not intend to conduct complete simulations for entire applications on current or future parallel machines. Instead, we plan to use detailed simulation and/or runtime profiling only for the performance-critical segments of the application. We plan to use the results of detailed simulation for two purposes: (1) to develop the cost functions used by PetaSIM and (2) to verify the predictions generated by PetaSIM.

This is similar to the approach we have used in Howsim [59], a coarse-grain simulator for I/O-intensive tasks on workstation clusters developed at Maryland. We have developed Howsim for evaluation of architectural and OS policy alternatives for I/O-intensive tasks. Accordingly, Howsim simulates I/O devices (storage and network) and the corresponding OS software at a fairly detailed level and the processor at a fairly coarse level. To obtain the hardware and operating system cost functions needed for Howsim, we profiled a small set of micro-applications that exercised specific hardware and OS functionality on the IBM SP-2 and a cluster of Digital 4/2100 multiprocessor workstations. This approach has worked well for Howsim. For example, for the SP-2, Howsim was successfully able to model the application-level network bandwidth across a seven orders of magnitude difference in message size. The error for most message sizes was 2-6Howsim was able to model the application-level network bandwidth within an error of 10 We do not intend to develop or significantly extend detailed hardware simulators. Instead, we plan to use (and possibly integrate) existing hardware and complete system simulators such as SimOS [54], Proteus [11], Mint [60], Howsim and Trojan [49]. We expect to start with SimOS and integrate other simulators as needed. SimOS provides detailed models for shared-memory multiprocessors and can simulate highly realistic application workloads with acceptable slowdown. It provides a fairly good interface for adjusting the hardware configuration, such as the number of processors, clock speed, cache parameters, memory and disk system parameters. SimOS is geared towards shared-memory machines and does not emphasize network simulation. Other simulators, however, do provide good network models (e.g. Proteus provides detailed simulation of k-ary n-cube networks and Howsim simulates point-to-point networks). We plan to integrate these simulators (and possibly others) on an as-needed basis.


next up previous
Next: Emulating Families of High Up: Estimation of Performance Cost Previous: Analytic Estimation and Compiler

Wes Stevens
Fri Jul 11 15:07:44 EDT 1997