4. Measuring and Reporting Performance of Benchmarks

Benchmark programs should be derived from how actual applications will execute. However, performance is often the result of combined characteristics of a given computer architecture and system software/hardware components in addition to the microprocessor. Other factors such as the operating system, compilers, libraries, memory design and I/O subsystem characteristics may also have impacts on the results and make comparisons difficult.

4.1 Measuring Performance

Two ways to measure the performance are:

The speed measure - which measures how fast a computer completes a single task. For example, the SPECint95 is used for comparing the ability of a computer to complete single tasks.
The throughput measure - which measures how many tasks a computer can complete in a certain amout of time. The SPECint_rate95 measures the rate of a machine carrying out a number of tasks.

4.2 Interpreting Results

There are three important guidelines to remember when interpreting benchmark results:

1. Be aware of what is being measured. When making critical purchasing decisions based on results from standard benchmarks, it is very important to know what is actually been measured. Without knowing, it is difficult to know whether the measurements obtained is even relevant to the applications which will run on the system being purchased. Questions to consider are: does the benchmark measure the overall performance of the system or just components of the system such as the CPU or memory?

2. Representativeness is key. How close is the benchmark to the actual application being executed? The closer it is, the better it will be at predicting the performance. For example, a component-level benchmark would not be good predictors of performance for an application that would use the entire system. Likewise, application benchmarks would be the most accurate predictors of performance for individual applications.

3. Avoid single-measure metrics. Application performance should not be measured with just a single number. No single numerical measurement can completely describe the performance of a complex device like the CPU or the entire system. Also, try to avoid benchmarks that average several results into a single measurement. Important information may be lost in average values. Try to evaluate all the results from different benchmarks that are relevant to the application. This may give a more accurate picture than evaluating the results from one benchmark alone.

4.3 Reporting Performance

There are some points to remember when reporting results obtained from running benchmarks.

Use newer version over the older. If an updated and revised version of a benchmark suite is available, it is usually preferred over the outdated one. Generally there are good reasons for revising the original. They include, but not limited to, changes in technology, improvements in compiler efficiency, etc.
Use all programs in a suite. There may be legitimate reasons why only a subset was used, but they should be explained. Otherwise, someone looking at the results may become suspicious as to why the other programs were not considered. Explain about the selection process, why it was not arbitrary, and why it was useful to do so.
Report compilation mode. The compilation mode that was used is important and should be reported in every case. The effect of a certain new hardware feature may be dependent on whether it is applied to optimzed or unoptimized programs.
Use a variety of benchmarks when reporting performance. Generally it is a good idea to use other set of programs as additional test cases. One set of benchmarks may behave differently than another set and such observations may be useful as to the next round of benchmark selections.
List all factors affecting performance. Have enough information about performance measurements to allow readers to duplicate the results. These include:
1. program input
2. version of the program
3. version of compiler
4. optimizing level of compiled code
5. version of operating system
6. amount of main memory
7. number and types of disks
8. version of the CPU