1. This problem addresses how a common vector loop runs on a variety of pipeline versions of the DLX. The loop is the so-called SAXPY loop, shown below, which implements the vector operation Y = a X + Y for a vector of length 100.

foo: LD F2, 0(R1) ; load X(i)
MULTD F4, F2, F0 ; multiply a*X(i)
F6, 0(R2) ; load Y(i)
ADDD F6, F4, F6 ; add a*X(i) + Y(i)
SD 0(R2), F6 ; store new Y(i)
ADDI R1,R1,#8 ; increment X index
ADDI R2,R2,#8 ; increment Y index
SUBI R3,R1,#800 ; test if done
BEQZ R3, foo ; loop if not done

Assume that all of the functional units are fully bypassed and that the FP functional units are fully pipelined except for the divide. Other relevant information is given in the tables below.

FU type Time in EX
Integer 1
FP adder 3
FP multiplier 4
FP divider 12
FU type # of FUs
Integer 2
FP adder 2
FP multiplier 1
FP divider 1

Producer Consumer Latencies
FP MULTIPLY any FP ALU op 4
FP add any FP ALU op 3
FP MULTIPLY FP store 3
FP add FP store 2
FP load FP store 0
FP load any FP ALU op 1
I ALU op any I ALU op 0
I load any I ALU op 1


a. Using the single issue DLX pipeline (IF/ID/EX/MEM/WB), fill in the table below showing when each instruction begins execution (i.e., enters its first EX cycle), the number of stall cycles for each instruction, and why a stall occurs for the first iteration of the loop. Assume that contention for MEM or WB is resolved by allowing the instruction that issued earliest to complete. Ignore the branch delay.


Instruction First EX Cycle # Stall Cycles Why Stall Occured
LD F2, 0(R1)
3 0  
MULTD F4,F2,F0
5 1 RAW
LD F6, 0(R2)
6 1 structural hazard
ADDD F6,F4,F6
8 2 RAW
SD 0(R2), F6
9 1 structural hazard in decode stage
ADDI R1,R1,#8
10 1 structural hazard in fetch stage
ADDI R2,R2,#8
12 1 WAR
SUBI R3,R1,#800
13 1 structural hazard in decode stage
BEQZ R3,foo
14 1 structural hazard in fetch stage
LD F2, 0(R1)
15 0  
MULTD F4,F2,F0
18 1 RAW



Using the DLX with scoreboarding

Instruction Unit IS RO EX WB Comments
LD F2, 0(R1)
INT1 1 2 3 4  
MULTD F4,F2,F0
MULT 2 4 5-8 9 stalls in cc 3, waiting for F2
LD F6, 0(R2)
INT2 3 4 5 6  
ADDD F6,F4,F6
ADD1 4 9 10-12 13 stalls in cc 5-8, waiting for F4,F6
SD 0(R2), F6
INT1 5 13 14 15 stalls in cc 6-12, waiting for F6
ADDI R1,R1,#8
INT2 6 7 8 9  
ADDI R2,R2,#8
INT2 9 10 11 14 stall 12-13
SUBI R3,R1,#800
INT2 14 15 16 17  
BEQZ R3,foo
INT1 15 16 17 18  


Using the DLX with Tomasulo's Algoritm

Instruction UNIT IS EX WR Comments
LD F2, 0(R1)
INT1 1 2-3 4  
MULTD F4,F2,F0
MULT1 2 4-8 9 waiting for F2
LD F6, 0(R2)
INT2 3 4-5 6  
ADDD F6,F4,F6
ADD1 4 9-12 13 waiting for F4, F6, on cc 5-8
SD 0(R2), F6
INT1 5 13-14 15 waiting for F6 on cc 6-12
ADDI R1,R1,#8
INT2 6 7-8 9 on cc 9 there is a structural hazard
ADDI R2,R2,#8
INT2 10 11-12 14 structural hazard bus on wb on cc 13
SUBI R3,R1,#800
INT2 14 15-16 17 structural hazard Int on 11-13
BEQZ R3,foo
           
LD F2,0(R1)
INT1 15 16-17 18 structural hazard int
MULTD F4,F2,F0
INT2 17 19-23 24 wait for f2