Prediction

Playing the odds...

When a RISC machine encounters a branch instruction, there are only two possible outcomes, either the branch is taken, or the branch is not taken. So, if you do not want to wait until the branch instruction completes, you must guess which instruction you will execute next. This is prediction.

There are multiple methods of branch prediction, predict-taken, predict-not-taken, and dynamic-hardware-prediction will be covered below. Predict-taken and predict-not-taken are just flip sides of the same coin. They both assume that more often than not, a program will do the same behavior. With predict-not-taken, the hardware assumes that the next instruction to be executed will be the instruction following the branch instruction in memory. This (possible) next instruction is placed in the pipe and execution begins. If after the branch is finished executing, it is found that we really wanted to take the branch, then the (possible) next instruction that we had been executing is thrown out by the hardware, and we being executing the correct instruction, the branch target.

To help facilitate the above method, Hennesy and Patterson suggest that the original version of DLX be modified so that the outcome of branches is computed in the ID stage. This increases the hardware and complexity, but cuts down on the number of wasted cycles when we predict incorrectly. Note that using this modification with the freezing strategy would also cut down the number of stalls, from 3 per loop iteration, to 1. However, as long as we are modifying the hardware, we may as well add some simple logic so we at least get rid of that one stall some of the time.

Predict-not-taken has some problems though. In most programs, it would actually be more probable that a branch is taken, so by predicting not-taken, we are knowingly predicting wrong most of the time. That is bad. So, why not predict taken? In the DLX architecture, the address of the branch target is not known until too late in the pipeline cycle, so there is no benefit gained over the freezing method. In other computers however, if the address of the branch target was computed earlier in the pipeline, then there would be some gain in using the predict-taken strategy.

The final method of branch prediction that we will discuss is dynamic prediction. This method uses 1 (or more) bits in the hardware to keep track of whether the last branch was taken, or not-taken, and assumes that the next branch will have the same outcome. If one bit is used, then the bit is turned on when a branch is taken, and turned off when a branch is not taken. Then, when the next branch is encountered, the bit is checked, if it is on, we act as for a predict-taken strategy, if it is off, we act as for a predict-not-taken strategy. This 1-bit method will usually predict incorrectly when entering and leaving a loop, but will help while inside a loop. Once again, regular DLX cannot really benefit from this, but actual machines will.

An improvement on the 1-bit predictor, is a 2-bit predictor. This method uses 2 bits, that act like a 4 stage state diagram, to predict taken, or not-taken. The two bits require 2 incorrect quesses (instead of one) to change their prediction scheme. So, if we are in a loop, predicting taken, and we exit the loop (1 incorrect guess). When we start to enter the next loop, we will again guess taken, and probably be right, in which case, the 2-bit predictor saved us from a stall that the 1-bit predictor would have failed to avoid.

Even more bits could possibly be used, but generally they offer little or no improvement over the 2-bit predictor, which is the most common implementation of dynamic prediction.

Predict Not-Taken Example:	LOOP:	LW R1, 0(R2) SUB R3, R3, R1 BNEZ R3 LOOP SW R4, 0(R3)
Assuming that the first time through the loop we take the branch, and the second time through we do not, we will get the following code being executed:
		1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17
LW R1, 0(R2)		IF	ID	EX	Mem	WB
SUB R3,R3,R1			IF	ID	EX	Mem	WB
BNEZ R3,Loop				IF	ID	EX	Mem	WB
SW R4, 0(R3)					IF	clear	clear	clear	clear
LW R1, 0(R2)						IF	ID	EX	Mem	WB
SUB R3,R3,R1							IF	ID	EX	Mem	WB
BNEZ R3,Loop								IF	ID	EX	Mem	WB
SW R4, 0(R3)									IF	ID	EX	Mem	WB

With two iterations through the loop, our predict-not-taken strategy, with the modified DLX, gave us only one stall (the IF which was followed by no-ops when it was found in the BNEZ ID stage that we were executing the wrong instruction). If the address of the branch target was known in the IF stage, then predict-taken would have caused the same total number of stalls, it just would have occured after the second BNEZ instead of the first.

2-bit Dynamic Prediction Example:	LOOP:	LW R1, 0(R2) SUB R3, R3, R1 BNEZ R3 LOOP SW R4, 0(R3)
We are executing the code under a few assumptions, first, that we know the branch target address after the IF stage completes. We are entering the code with our 2-bit predictor predicting taken, and after one false prediction. And we are going to go through our loop three times (ie, take the branch twice).
		1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	bits
LW R1, 0(R2)		IF	ID	EX	Mem	WB												01
SUB R3,R3,R1			IF	ID	EX	Mem	WB											01
BNEZ R3,Loop				IF	ID	EX	Mem	WB										01
LW R1, 0(R2)					IF	ID	EX	Mem	WB									00
SUB R3,R3,R1						IF	ID	EX	Mem	WB								00
BNEZ R3,Loop							IF	ID	EX	Mem	WB							00
LW R1, 0(R2)								IF	ID	EX	Mem	WB						00
SUB R3,R3,R1									IF	ID	EX	Mem	WB					00
BNEZ R3,Loop										IF	ID	EX	Mem	WB				00
LW R1, 0(R2)											IF	clear	clear	clear	clear			01
SW R4, 0(R3)												IF	ID	EX	Mem	WB		01

From the above example, we see that even though our prediction was incorrect before coming into our loop, we still predicted taken the first time through, and our prediction was correct. So, the 2-bit predictor was reset after the first BNEZ instruction was completed (actually, after the ID stage of BNEZ). The second time through, we once again guessed correctly, and so our bits did not change. On the third pass, we again guessed taken, but were incorrect. So, our bits cycle into the next stage. Now, when the next branch instruction comes along, we will still guess taken. If we are correct, we will rever to the 00 state. If we are wrong, and that branch is also not-taken, then our bits will cycle into the 11 (binary) state and will then predict not-taken until they once again guess wrong twice in a row. This method generally works better than a 1-bit predictor since we must always guess incorrectly once to exit a loop, and so a 1-bit predictor will almost always guess wrong when entering a loop, while the 2-bit predictor will not.

PREVIOUS: Freezing

NEXT: Delayed Branch

Prediction

Playing the odds...

LOOP:

LW R1, 0(R2) SUB R3, R3, R1 BNEZ R3 LOOP SW R4, 0(R3)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

LW R1, 0(R2)

IF

ID

EX

Mem

WB

SUB R3,R3,R1

IF

ID

EX

Mem

WB

BNEZ R3,Loop

IF

ID

EX

Mem

WB

SW R4, 0(R3)

IF

clear

clear

clear

clear

LW R1, 0(R2)

IF

ID

EX

Mem

WB

SUB R3,R3,R1

IF

ID

EX

Mem

WB

BNEZ R3,Loop

IF

ID

EX

Mem

WB

SW R4, 0(R3)

LW R1, 0(R2)
SUB R3, R3, R1
BNEZ R3 LOOP
SW R4, 0(R3)