4. Measuring and Reporting Performance of Benchmarks


Benchmark programs should be derived from how actual applications will execute. However, performance is often the result of combined characteristics of a given computer architecture and system software/hardware components in addition to the microprocessor. Other factors such as the operating system, compilers, libraries, memory design and I/O subsystem characteristics may also have impacts on the results and make comparisons difficult.


4.1 Measuring Performance

Two ways to measure the performance are:

  1. The speed measure - which measures how fast a computer completes a single task. For example, the SPECint95 is used for comparing the ability of a computer to complete single tasks.

  2. The throughput measure - which measures how many tasks a computer can complete in a certain amout of time. The SPECint_rate95 measures the rate of a machine carrying out a number of tasks.


4.2 Interpreting Results

There are three important guidelines to remember when interpreting benchmark results:

1. Be aware of what is being measured. When making critical purchasing decisions based on results from standard benchmarks, it is very important to know what is actually been measured. Without knowing, it is difficult to know whether the measurements obtained is even relevant to the applications which will run on the system being purchased. Questions to consider are: does the benchmark measure the overall performance of the system or just components of the system such as the CPU or memory?

2. Representativeness is key. How close is the benchmark to the actual application being executed? The closer it is, the better it will be at predicting the performance. For example, a component-level benchmark would not be good predictors of performance for an application that would use the entire system. Likewise, application benchmarks would be the most accurate predictors of performance for individual applications.

3. Avoid single-measure metrics. Application performance should not be measured with just a single number. No single numerical measurement can completely describe the performance of a complex device like the CPU or the entire system. Also, try to avoid benchmarks that average several results into a single measurement. Important information may be lost in average values. Try to evaluate all the results from different benchmarks that are relevant to the application. This may give a more accurate picture than evaluating the results from one benchmark alone.


4.3 Reporting Performance

There are some points to remember when reporting results obtained from running benchmarks.



Overview | Introduction | Types | Engineering | Revising | Questions