Prediction

Playing the odds...


When a RISC machine encounters a branch instruction, there are only two possible outcomes, either the branch is taken, or the branch is not taken. So, if you do not want to wait until the branch instruction completes, you must guess which instruction you will execute next. This is prediction.

There are multiple methods of branch prediction, predict-taken, predict-not-taken, and dynamic-hardware-prediction will be covered below. Predict-taken and predict-not-taken are just flip sides of the same coin. They both assume that more often than not, a program will do the same behavior. With predict-not-taken, the hardware assumes that the next instruction to be executed will be the instruction following the branch instruction in memory. This (possible) next instruction is placed in the pipe and execution begins. If after the branch is finished executing, it is found that we really wanted to take the branch, then the (possible) next instruction that we had been executing is thrown out by the hardware, and we being executing the correct instruction, the branch target.

To help facilitate the above method, Hennesy and Patterson suggest that the original version of DLX be modified so that the outcome of branches is computed in the ID stage. This increases the hardware and complexity, but cuts down on the number of wasted cycles when we predict incorrectly. Note that using this modification with the freezing strategy would also cut down the number of stalls, from 3 per loop iteration, to 1. However, as long as we are modifying the hardware, we may as well add some simple logic so we at least get rid of that one stall some of the time.

Predict-not-taken has some problems though. In most programs, it would actually be more probable that a branch is taken, so by predicting not-taken, we are knowingly predicting wrong most of the time. That is bad. So, why not predict taken? In the DLX architecture, the address of the branch target is not known until too late in the pipeline cycle, so there is no benefit gained over the freezing method. In other computers however, if the address of the branch target was computed earlier in the pipeline, then there would be some gain in using the predict-taken strategy.

The final method of branch prediction that we will discuss is dynamic prediction. This method uses 1 (or more) bits in the hardware to keep track of whether the last branch was taken, or not-taken, and assumes that the next branch will have the same outcome. If one bit is used, then the bit is turned on when a branch is taken, and turned off when a branch is not taken. Then, when the next branch is encountered, the bit is checked, if it is on, we act as for a predict-taken strategy, if it is off, we act as for a predict-not-taken strategy. This 1-bit method will usually predict incorrectly when entering and leaving a loop, but will help while inside a loop. Once again, regular DLX cannot really benefit from this, but actual machines will.

An improvement on the 1-bit predictor, is a 2-bit predictor. This method uses 2 bits, that act like a 4 stage state diagram, to predict taken, or not-taken. The two bits require 2 incorrect quesses (instead of one) to change their prediction scheme. So, if we are in a loop, predicting taken, and we exit the loop (1 incorrect guess). When we start to enter the next loop, we will again guess taken, and probably be right, in which case, the 2-bit predictor saved us from a stall that the 1-bit predictor would have failed to avoid.

Even more bits could possibly be used, but generally they offer little or no improvement over the 2-bit predictor, which is the most common implementation of dynamic prediction.


Predict Not-Taken Example:

LOOP:
LW R1, 0(R2)
SUB R3, R3, R1
BNEZ R3 LOOP
SW R4, 0(R3)


Assuming that the first time through the loop we take the branch, and the second time through we do not, we will get the following code being executed:

 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
LW R1, 0(R2)
IF
ID
EX
Mem
WB
 
 
 
 
 
 
 
 
 
 
 
 
SUB R3,R3,R1
 
IF
ID
EX
Mem
WB
 
 
 
 
 
 
 
 
 
 
 
BNEZ R3,Loop
 
 
IF
ID
EX
Mem
WB
 
 
 
 
 
 
 
 
 
 
SW R4, 0(R3)
 
 
 
IF
clear
clear
clear
clear
 
 
 
 
 
 
 
 
 
LW R1, 0(R2)
 
 
 
 
IF
ID
EX
Mem
WB
 
 
 
 
 
 
 
 
SUB R3,R3,R1
 
 
 
 
 
IF
ID
EX
Mem
WB
 
 
 
 
 
 
 
BNEZ R3,Loop
 
 
 
 
 
 
IF
ID
EX
Mem
WB
 
 
 
 
 
 
SW R4, 0(R3)
 
 
 
 
 
 
 
IF
ID
EX
Mem
WB
 
 
 
 
 

With two iterations through the loop, our predict-not-taken strategy, with the modified DLX, gave us only one stall (the IF which was followed by no-ops when it was found in the BNEZ ID stage that we were executing the wrong instruction). If the address of the branch target was known in the IF stage, then predict-taken would have caused the same total number of stalls, it just would have occured after the second BNEZ instead of the first.


 

2-bit Dynamic Prediction Example:

LOOP:
LW R1, 0(R2)
SUB R3, R3, R1
BNEZ R3 LOOP
SW R4, 0(R3)


We are executing the code under a few assumptions, first, that we know the branch target address after the IF stage completes. We are entering the code with our 2-bit predictor predicting taken, and after one false prediction. And we are going to go through our loop three times (ie, take the branch twice).

 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
bits
LW R1, 0(R2)
IF
ID
EX
Mem
WB
 
 
 
 
 
 
 
 
 
 
 
01
SUB R3,R3,R1
 
IF
ID
EX
Mem
WB
 
 
 
 
 
 
 
 
 
 
01
BNEZ R3,Loop
 
 
IF
ID
EX
Mem
WB
 
 
 
 
 
 
 
 
 
01
LW R1, 0(R2)
 
 
 
IF
ID
EX
Mem
WB
 
 
 
 
 
 
 
 
00
SUB R3,R3,R1
 
 
 
 
IF
ID
EX
Mem
WB
 
 
 
 
 
 
 
00
BNEZ R3,Loop
 
 
 
 
 
IF
ID
EX
Mem
WB
 
 
 
 
 
 
00
LW R1, 0(R2)
 
 
 
 
 
 
IF
ID
EX
Mem
WB
 
 
 
 
 
00
SUB R3,R3,R1
             
IF
ID
EX
Mem
WB
       
00
BNEZ R3,Loop
               
IF
ID
EX
Mem
WB
     
00
LW R1, 0(R2)
 
 
 
 
 
 
 
 
 
IF
clear
clear
clear
clear
 
 
01
SW R4, 0(R3)
 
 
 
 
 
 
 
 
 
 
IF
ID
EX
Mem
WB
 
01


From the above example, we see that even though our prediction was incorrect before coming into our loop, we still predicted taken the first time through, and our prediction was correct. So, the 2-bit predictor was reset after the first BNEZ instruction was completed (actually, after the ID stage of BNEZ). The second time through, we once again guessed correctly, and so our bits did not change. On the third pass, we again guessed taken, but were incorrect. So, our bits cycle into the next stage. Now, when the next branch instruction comes along, we will still guess taken. If we are correct, we will rever to the 00 state. If we are wrong, and that branch is also not-taken, then our bits will cycle into the 11 (binary) state and will then predict not-taken until they once again guess wrong twice in a row. This method generally works better than a 1-bit predictor since we must always guess incorrectly once to exit a loop, and so a 1-bit predictor will almost always guess wrong when entering a loop, while the 2-bit predictor will not.

PREVIOUS: Freezing
NEXT: Delayed Branch