Dynamic Scheduling


Scoreboard

The tables below show the status of the scoreboard after the first ADDD has finished writing its result back to the register file.

Instruction status

InstructionIssueRead operandsExecutionWrite result
LD    F2,0(R1)
x
x
x
x
ADDD  F6,F2,F4
x
x
x
x
MULTD F8,F6,F0
x
-
-
-
LD    F10,-8(R1)
x
x
x
x
ADDD  F12,F10,F4
x
x
x
-
MULTD F14,F12,F0
-
-
-
-
SUBI  R1,R1,#16
-
-
-
-
BNEZ  R1,Loop
-
-
-
-

Functional unit status

NameBusyOpFiFjFkQjQkRjRk
Int1No
-
-
-
-
-
-
-
-
Int2No
-
-
-
-
-
-
-
-
Add1No
-
-
-
-
-
-
-
-
Add2YesAddF12F10F4
-
-
NoNo
Mult1YesAddF8F6F0Add1
-
YesYes
Div1No
-
-
-
-
-
-
-
-

Register result status

F0F2F4F6F8F10F12F14
-
-
-
-
Mult1
-
Add2
-

Note that even though the integer functional units are not busy, we can't issue any more instructions since the next un-issued instruction is a multiply, and we don't have any multiply units available. The second add unit is almost finished; it is one clock cycle behind the first add unit since they couldn't both read the F4 register from the register file. Also, now Rj and Rk are both Yes for the multiply unit, which means it can read the operands on the next clock cycle.

From the time the operands are read, assume loads take 4 cycles to complete, integer ALU operations 3 cycles, floating point add 6 cycles, and floating point multiply 11 cycles. Then the scoreboard machine will take 33 cycles complete the above instructions. Note that for the most part, the scoreboard machine can execute two iterations of the loop in parallel since there are no loop-carried dependences. The only thing limiting this is the lack of a second floating point multiply unit.


Back to Scoreboarding


Author: Allan Tong
Contact actong@wam.umd.edu