EXAMPLES

1. This problem addresses how a common vector loop runs on a variety of pipeline versions of the DLX. The loop is the so-called SAXPY loop, shown below, which implements the vector operation Y = a X + Y for a vector of length 100.

foo:	LD F2, 0(R1)	; load X(i)
	MULTD F4, F2, F0	; multiply a*X(i)
	F6, 0(R2)	; load Y(i)
	ADDD F6, F4, F6	; add a*X(i) + Y(i)
	SD 0(R2), F6	; store new Y(i)
	ADDI R1,R1,#8	; increment X index
	ADDI R2,R2,#8	; increment Y index
	SUBI R3,R1,#800	; test if done
	BEQZ R3, foo	; loop if not done

Assume that all of the functional units are fully bypassed and that the FP functional units are fully pipelined except for the divide. Other relevant information is given in the tables below.

FU type	Time in EX
Integer	1
FP adder	3
FP multiplier	4
FP divider	12

FU type	# of FUs
Integer	2
FP adder	2
FP multiplier	1
FP divider	1

Producer	Consumer	Latencies
FP MULTIPLY	any FP ALU op	4
FP add	any FP ALU op	3
FP MULTIPLY	FP store	3
FP add	FP store	2
FP load	FP store	0
FP load	any FP ALU op	1
I ALU op	any I ALU op	0
I load	any I ALU op	1

a. Using the single issue DLX pipeline (IF/ID/EX/MEM/WB), fill in the table below showing when each instruction begins execution (i.e., enters its first EX cycle), the number of stall cycles for each instruction, and why a stall occurs for the first iteration of the loop. Assume that contention for MEM or WB is resolved by allowing the instruction that issued earliest to complete. Ignore the branch delay.

Instruction

First EX Cycle

# Stall Cycles

Why Stall Occured

F2, 0(R1)

MULTD

F4,F2,F0

RAW

F6, 0(R2)

structural hazard

ADDD

F6,F4,F6

RAW

0(R2), F6

structural hazard in decode stage

ADDI

R1,R1,#8

structural hazard in fetch stage

ADDI

R2,R2,#8

WAR

SUBI	R3,R1,#800

structural hazard in decode stage

BEQZ

R3,foo

structural hazard in fetch stage

F2, 0(R1)

MULTD

F4,F2,F0

RAW

Using the DLX with scoreboarding

Instruction

Unit

Comments

F2, 0(R1)

INT1

MULTD

F4,F2,F0

MULT

5-8

stalls in cc 3, waiting for F2

F6, 0(R2)

INT2

ADDD

F6,F4,F6

ADD1

10-12

stalls in cc 5-8, waiting for F4,F6

0(R2), F6

INT1

stalls in cc 6-12, waiting for F6

ADDI

R1,R1,#8

INT2

ADDI

R2,R2,#8

INT2

stall 12-13

SUBI	R3,R1,#800

INT2

BEQZ

R3,foo

INT1

Using the DLX with Tomasulo's Algoritm

Instruction

UNIT

Comments

F2, 0(R1)

INT1

2-3

MULTD

F4,F2,F0

MULT1

4-8

waiting for F2

F6, 0(R2)

INT2

4-5

ADDD

F6,F4,F6

ADD1

9-12

waiting for F4, F6, on cc 5-8

0(R2), F6

INT1

13-14

waiting for F6 on cc 6-12

ADDI

R1,R1,#8

INT2

7-8

on cc 9 there is a structural hazard

ADDI

R2,R2,#8

INT2

11-12

structural hazard bus on wb on cc 13

SUBI	R3,R1,#800

INT2

15-16

structural hazard Int on 11-13

BEQZ

R3,foo

F2,0(R1)

INT1

16-17

structural hazard int

MULTD

F4,F2,F0

INT2

19-23

wait for f2