3 Performance Metrics

Dr A. P. Shanthi

The objectives of this module are to identify and evaluate the performance metrics for a processor and also discuss the CPU performance equation.

When you look at the computer engineering methodology you have technology trends that happen and various improvements that happen with respect to technology and this will give rise to newer and newer architectures. You have to evaluate the existing systems for bottlenecks and then try to come up with better architectures and this process continues. While evaluating the existing systems for bottlenecks, you will have to have certain metrics and certain benchmarks based on which you’ll have the evaluation done.

You should basically be able to

measure performance
report performance and
summarise performance.

These steps are necessary because that’ll help you make intelligent choices about the computer systems that you want to purchase. It will help you see through the marketing hype – there is so much of hype happening about computer systems and unless you have some basics about the performance of computer systems you will not be able to manage this and you will not be able to make a judicious choice when purchasing systems. Understanding performance measures is also a key to understanding the underlying organizational motivation, based on what factors people try to bring these modifications, so that performance will be improved. You will be able to understand the motivational aspects based on which certain innovations were brought in. While discussing about performance, you should be able to answer some questions like this:

• Why is some hardware better than others for different programs?

• What factors of system performance are hardware related? (e.g., Do we need a new machine, or a new operating system?)

• How does the machine’s instruction set affect performance?

Performance is important both from the purchasing perspective and the designer’s perspective. When you look at the purchasing perspective, given a collection of machines, you’ll have to be able to decide which has the best performance, the least cost, and also the best cost per performance ratio. Similarly, from a designer’s perspective, you are faced with several design options like which has the best performance improvement, least cost and best cost/performance. Unless you have some idea about the performance metrics, you will not be able to decide which will be the best performance improvement that you can think of and which will lead to least cost and which will give you the best cost performance ratio. So, whether you’re looking at the designer’s perspective or purchaser’s perspective, both of them need to have some knowledge about the performance metrics and both require these performance metrics for comparison.

Our goal is to understand what factors in the architecture contribute to the overall system performance and the relative importance and cost of these factors. Performance means different things to different people. Say, for example, take an analogy from the airline industry. If you have to choose between different types of aircrafts, what the various factors that you’ll have to consider? Do you have to worry only about the cruising speed – how fast the craft flies, or do you have to worry about how far the car craft will fly – the flight range., or look at how big these aircrafts are and how many people can be transported at one point of time from one place to another place. So these are different factors that need to be considered and you cannot expect a particular aircraft to satisfy all these requirements. You’ll have to decide which one is more important than the other factors. All three factors are important, no doubt about it, but all three of them may not be equally important – you may have more importance to certain factors compared to other factors. The criteria of performance evaluation differ among the users and designers. The same holds good when you’re looking at a computer industry also. You have different classes of computer systems and you may have certain performance criteria which are important for certain types of applications, whereas they may not be so important for other types of applications. You should be able to decide which is important for which type of processor. You’ll have to be aware of the fact that you should never let an engineer get away with simply presenting the data – you always should insist that he or she should lead off with the conclusions to which the data led, to justify the reasons why you get these data. Only when you are able to understand the internal architecture of the processor, you’ll be able to make a judicious choice.

There are different things that affect the performance of a computer system. The instructions that you use and the implementation of these instructions, the memory hierarchy, the way the I/O is handled – all this may contribute to your performance. The primary factor when you’re looking at computer performance is time. All of us are worried about how fast the program executes. So the most important performance factor is the time. When you’re looking at time being the most important factor, are you looking at response time, or are you looking at something else? What we mean by response time is the latency – you ask the processor to execute a particular task and how fast you get a response from the processor – that is basically what is called the response time.

How long does it take for my job to run?
How long does it take to execute a job?
How long must I wait for the database query?

The other important time factor is throughput. It is the total amount of work done in a given time.

• How many jobs can the machine run at once?

• What is the average execution rate?

• How much work is getting done?

Response time (execution time) – the time between the start and the completion of a task is important to individual users. Throughput (bandwidth) – the total amount of work done in a given time is important to data center managers. We will need different performance metrics as well as a different set of applications to benchmark embedded and desktop computers, which are more focused on response time, versus servers, which are more focused on throughput

If we have to maximize performance, we obviously need to minimize our execution time. Performance is inversely related to execution time.

Performance = 1/ Execution time

If a processor X is n times faster than Y, then,

Decreasing response time almost always improves throughput.

As an example, If computer A runs a program in 10 seconds and computer B runs the same program in 20 seconds, how much faster is A than B?

Speedup of A over B = 20 /10 = 2, indicating A is two times faster than B.

Execution time is the time the CPU spends working on the task, it does not include the time waiting for I/O or running other programs. You know the processor does not run only your program, it may be running other programs also and when there is an I/O transfer, it may block this program and then switch over to a different program. We don’t consider the time taken for doing the I/O operations and always only worried about the CPU execution time. That is the time that the CPU spends on a particular program.

To determine the CPU execution time for a program, you can find out the total number of clock cycles that the program takes and multiply it by the clock cycle time. Each program is made up of a number of instructions and each instruction takes a number of clock cycles to execute. If you find out the total number of clock cycles per program and if you know the clock cycle time for each of these clock cycles, then the CPU execution times can simply be calculated as the product of the total number of CPU clock cycles per program and these clock cycle. Because of the clock cycle time and clock rate being inversely related, this can also be written as CPU clock cycles for a program divided by the clock rate.

Since the CPU execution time is a product of these two factors, you can improve performance by either reducing the length of the clock cycle time or by the number of clock cycles required for a program. A clock cycle is the basic unit of time to execute one operation/pipeline stage/etc. The clock rate (clock cycles per second in MHz or GHz) is inverse of clock cycle time (clock period) CC = 1 / CR.

The clock rate basically depends on the specific CPU organization, whether it is pipelined or non-pipelined, the hardware implementation technology – the VLSI technology that is used. A 10 ns clock cycle relates to 100 MHz clock rate, a 5 ns clock cycle relates to 200 MHz clock rates and so on. If you’re looking at a 250 ps clock cycle, then it corresponds to 4 GHz clock rate. The higher the clock frequency, the lower is your clock cycle.

As an example, consider the following problem:

A program runs on computer A with a 2 GHz clock in 10 seconds. What clock rate must a computer B run at to run this program in 6 seconds? Unfortunately, to accomplish this, computer B will require 1.2 times as many clock cycles as computer A to run the program.

You find that the second processor should run at a clock rate of 4 GHz if you want to finish the program a little earlier.

When you have to find out the total execution time in terms of the total number of clock cycles multiplied by the clock cycle period, you have a problem of calculating the total number of clock cycles. Not all instructions take the same amount of time to execute – say you’ll have to know the number of clock cycles that each instruction takes and you should be able to add up all these clock cycles to find out the total number of clock cycles. One way to think about execution time is that it equals the number of instructions multiplied by the average time per instruction. Somehow, if we find out the average time per instruction, we should be able to calculate the execution time. A computer machine (ISA) instruction is comprised of a number of elementary or micro operations which vary in number and complexity depending on the instruction and the exact CPU organization (Design). A micro operation is an elementary hardware operation that can be performed during one CPU clock cycle. This corresponds to one micro-instruction in microprogrammed CPUs. Examples: register operations: shift, load, clear, increment, ALU operations: add , subtract, etc. Thus, a single machine instruction may take one or more CPU cycles to complete termed as the Cycles Per Instruction (CPI). Average (or effective) CPI of a program: The average CPI of all instructions executed in the program on a given CPU design.

Example problem:

• Computers A and B implement the same ISA. Computer A has a clock cycle time of 250 ps and an effective CPI of 2.0 for some program and computer B has a clock cycle time of 500 ps and an effective CPI of 1.2 for the same program. Which computer is faster and by how much?

Each computer executes the same number of instructions, I, so

Computing the overall effective CPI is done by looking at the different types of instructions and their individual cycle counts and averaging.

where ICi is the count (percentage) of the number of instructions of class i executed, CPIi is the (average) number of clock cycles per instruction for that instruction class and n is the number of instruction classes.

The overall effective CPI varies by instruction mix – is a measure of the dynamic frequency of instructions across one or many programs.

To look at an example, consider the following instruction mix:

How much faster would the machine be if a better data cache reduced the average load time to 2 cycles?

– Load à 20% x 2 cycles = .4

– Total CPI 2.2 à 1.6

– Relative performance is 2.2 / 1.6 = 1.38

How does this compare with reducing the branch instruction to 1 cycle?

– Branch à 20% x 1 cycle = .2

– Total CPI 2.2 à 2.0

– Relative performance is 2.2 / 2.0 = 1.1 We can now write the basic performance equation as:

These equations separate the three key factors that affect performance

Can measure the CPU execution time by running the program
The clock rate is usually given
Can measure overall instruction count by using profilers/ simulators without knowing all of the implementation details
CPI varies by instruction type and ISA implementation for which we must know the implementation details

To conclude, if you look at the aspects of the CPU execution time, you have three factors which affect the CPU execution time – the clock cycle time, the average number of clock cycles per instruction which is your CPI value and the instruction count. The various factors that affect these three parameters are:

Instruction count is affected by different factors – depends on the way the program is written, if you are a skilled programmer, you use a crisp algorithm and you code it appropriately, then it is going to use less number of instructions. So, the first thing depends upon the algorithm that you want to use and the skill of the programmer who writes this code. The second thing is once you’ve written a code, the compiler is responsible for translating these instructions into your machine instructions. The compiler should be an optimizing compiler so that it translates this code into fewer number of machine instructions. The compiler definitely has a role to play in reducing the instruction count, but remember the compiler can only use the instructions that are supported in your instruction set architecture. So the instruction set architecture also plays a role in reducing the instruction count. In the previous session, we’ve looked at how the same operation can be implemented as different sequences of instructions depending upon the ISA. So with the help of the ISA, the compiler will be able to generate code which uses less number of machine instructions.
Clock cycle time depends upon the CPU organization and also depends upon the technology that is used. By organization, we mean whether the instruction unit is implemented as a pipelined unit or a non-pipelined unit. Pipelining facilitates multi cycle operations, which reduce the clock cycle time. This will be dealt with in detail in the subsequent modules.
CPI, which is the average number of clock cycles per instruction, depends upon the program used because you may use complicated instructions which have a number of elementary operations or simple instructions. Similarly, the compiler may translate the program using complicated instructions instead of using simpler instructions. So, the compiler may also have a role to play, and because the compiler is only using the instructions in your ISA, the ISA definitely has a role to play. Finally, the CPU organization has also a role to play in deciding the CPI values.

Having identified the various parameters that will affect the three factors constituting the CPU performance equation, computer designers should strive to take appropriate design measures to reduce these factors, thereby reducing the execution time and thus improving performance.

To summarize, we’ve looked at how we could define the performance of a processor and why performance is necessary for a computer system. We have pointed out different performance metrics, looked at the CPU performance equation and the factors that affect the CPU performance equation. This module also provided different examples which illustrate the calculation of the CPU execution time using the CPU performance equation.

Web Links / Supporting Materials

Computer Architecture – A Quantitative Approach , John L. Hennessy and David A.Patterson, 5th.Edition, Morgan Kaufmann, Elsevier, 2011.
Computer Organization and Design – The Hardware / Software Interface, David A. Patterson and John L. Hennessy, 4th.Edition, Morgan Kaufmann, Elsevier, 2009.
Computer Organization, Carl Hamacher, Zvonko Vranesic and Safwat Zaky, 5th.Edition, McGraw- Hill Higher Education, 2011.