Many coded loops perform only a few instructions, or are very ordered. For example, this loop in C:

for(int i=0;i<1000;i++)
a[i] = b[i] + c[i];

might be compiled into this code in DLX

-- assume R1 contains the base address for the 'a' array
and R2 has the base address for the 'b' array and
R3 has the base address for the 'c' array and R4 starts at 1000

Loop: LW R5, 0(R2) ; element of b
LW R6, 0(R3) ; element of c
ADD R7, R6, R5 ; make next a
SW 0(R1), R7 ;
ADDI R1, R1, #4 ;
ADDI R2, R2, #4 ; increment addresses
ADDI R3, R3, #4
SUBI R4, R4, #1 ; decrement loop var
BNEZ R4, Loop

In that DLX fragment, fully half of the instructions in the loop are merely loop overhead -- needed to control the flow of the loop, and access, but adding commands. Also, you are required (in a simple DLX pipelike pipe) to stall for the branch control hazard every ninth instruction. Futhermore, as the load, add, and store form a partial ordering, you will be forced to stall waiting for each command to reach a certain point (with forwarding) or to finish (without) before executing the next instruction.

The goal, then, is to limit the amount of loop overhead as a proportion of the commands, to decrease the number of control hazards in the total run of the loop, and to fill the unavoidable stall spots with independent instructions. The first two can be done somewhat through loop unrolling, and the last through rescheduling.


Prev Next