CMSC 411 Fall 1998 Midterm Solutions - set 2
Courtesy of a fellow student ...
Problem 1: True Lies
--------------------
A: False: RISC has very few register addressing modes to simplyify
the instructions.
B: True
C: False: The MIPS rating Meas`ns absolutely squat. The overhead
on one machine may be totally different than on another.
D: False: There is a byte alignment error. 100 is not evenly
divisable by 8
E: False: This really depends on the instructions and the hardware.
F: True
G: False: The RISC Arcitechture is proff that this is false. The RISC
Arcitechure uses computer hardware to solve the hazards, thus
increasing the CPI.
H: False. In actual fact this question is false, since the next
instruction on a sequential machine (CISC) waits til the last
instruction is finished, and the PC is then waiting. In a RISC
machine, however, this is ALWAYS true.
I: False:
J: True
K: False: Latency is the number of clocks from when it enters the
FPADD, til the sum is ready.
L: False: The length of the clock cycle is determined by the SLOWEST
Stage.
M: False: DIV is not fully Pipelined, but multiply, add, and subtract
are.
Problem 2: Feirce Creatures
----------------------------
2.1: This question is similar to the one on the last Quiz.
The following information is used to solve the problem:
The clock RATE was DECREASED by 20%.
CRnew = .8CRold
The CPI is INCREASED by 30%
CPInew = 1.3 CPIold
The IC is DECREASED by 40%
ICnew = .6ICold
So the equation is:
ICold CPIold .8CRold .8 .8
------- * --------- * ------- = -------- = ----- = 1.02
.6ICold 1.3CPIold CRold .6 * 1.3 .78
2.2 The question uses percentages of the execution time.
Amdahls law uses execution time.
Unoptimized Percentages of execution time:
ALU = 50%
Loads = 20%
Other = 30%
Optimized Percentages of Execution Time:
ALU = 25%
Loads = 10%
Other = 65% ==> This is the remaining time not used
in the other instructions already listed.
since this is a percentage of program
run-time.
Using the picture method of Amdahl's Law, we get the total
execution time of the unmodifed process to be "A", and the
total execution time of the Modified process to be "B".
Also the Unmodified potrion was the portion that did not
change. Based on this information, the following is true:
x x
.65 = --- ; .30 = ---
B A
The question asks for effective speed up, which is A/B.
well above we have enough information to figure out what
A and B are. Minor Algebra makes the Above equations trivial:
.65B = x ; .30A = x; .65B = .30A; By this we can say:
A .65
- = --- This is the effective Speed up.
B .30
2.3 This is the Normalization question. The setup is simple:
Original program:
ALU ops: 40%
Loads: 20%
Stores: 15%
All others: 25% ==> Not given but not all instructions are of the
Types above
With the updates, 1/4 of the ALU ops do not have loads, so 10%
of the loads disappear, 10% of the ALU ops disappear, but 10% of
the instructions become ALU-L ops. The rest are unaffected. So
the new IC breakdown looks like this:
Upgraded Program:
ALU ops: 30
ALU-L ops: 10
Loads: 10
Stores: 15 ==> No Changes
Other: 25 ==> No Changes
This breakdown, however, does not add up to 100%. So it must be
normalized. The total of the the above is 90, not 100. So it
must be divides out on each of the catagories.
The answer to the question, then, is:
ALU ops: 30/.9 = 33.3%
ALU-L ops: 10/.9 = 11.1%
Loads: 10/.9 = 11.1%
Stores: 15/.9 = 16.7%
Other: 25/.9 = 27.8% ==> Not asked for, but helps to make
sure Total is 100%
Problem 3: Trading Places:
---------------------------
3.1 This question is looking for the type of architechure in terms
of the GPR measure. DLX is (0,3) architechture, and since the
only change was in the Loads, and stores, not in the ALU
instructions, it too is a (0,3) as well.
3.2 This is coding in the new architechture. The only difference
between this and the original DLX is the memory references.
So here is one version of the code:
ADDI R1, R0, #100
LW R2, (R1)
ADDI R1, R1, #4
LW R3, (R1)
ADDI R1, R1, #4
LW R4, (R1)
ADDI R1, R1, #4
LW R5, (R1)
ADD R2, R2, R3
ADD R5, R5, R4
ADDI R1, R0, #200
ADD R2, R2, R5
SW (R1), R2
3.3 Because you need to compute whether or not the branch condition
is true or false to determine whether the branch is taken, you
either need to wait til the EX2 phase is complete, to use the
ALU hardware, or put in new hardware somewhere in the Mem1 phase.
putting condition checking hardware in the MEM1 phase would allow
use of the PC after the clock pulse of Mem1. If you use the
existing hardware the PC would be ready after EX2, or perhaps EX1.
3.4 The hazards are similar to the original DLX model. The potential
hazards are as follows:
Control Hazards
Data Hazards
RAW
WAW
Structural Hazards (Resource Contention)
3.5 CPI will vary in this architechture. Depending on the type of
instructions executed.
IC will be higher in the DLX-LAX, since you do not have the
use of the offset. So instead of using an immediate offset, you
must have an additional instruction to initiate a register with
the offset.
Clock Cycle time will be shorter due to the splitting of the
memory stage.
Problem 4: Forever Young:
-------------------------
4.1 The problem is looking fo rthe Consumer-Producer pairs. Those
pairs are:
Pair Code Fragment
(Ex2, Mem1) ADD R1, R0, R3
SW (R2), R1
3 stalls neccessary.
(Ex2, Ex1) This doesnt happen in this arcitechture
since the ALU Writeback is finished
in time for the Decode phase.
(Mem2, Mem1) LW R2, (R1)
SW (R3), R2
One Stall Neccessary, before entering
Mem1 stage
(Mem2, Ex1) LW R2, (R1)
ADD R2, R2, R3
No Stalls Neccessary.
4.2 This question is fairly simple in terms of the hints given.
With a sequential machine, the clock cycle time is the total of
all the steps, if it is to complete in one clock cycle anyway.
The clock cycle of the piped version is the length of the longest
cycle, plus 1 ns for overhead. Therefore the clock cycle times
for each machine are as follows:
CCseq = 44 ns
CCpipe = 9 ns (ID phase takes 8 ns, plus 1 for overhead)
Next is the execution time. Well the equation for execution time
is the same for each machine, and is:
EQseq = IC * CCseq * CPIseq
EQpipe = IC * CCpipe * CPIpipe
In the hint, it says that the CPI for the sequential machine is 1,
and in the piped version, the CPI approaches its ideal value,
which is also 1. So as the CPI of the piped machine approaches
its ideal state, and with the above info, we get teh following:
CCseq = 44 ns ==> From above
CCpipe = 9 ns ==> From above
CPIseq = 1
CPIpipe = 1 ==> it is approaching this value
ICseq = ICpipe ==> There is no change in the language, just
the machine platform.
With this information, and given the rules for effective speed up,
the following is the answer to the question:
EQseq IC CCseq CPIseq 1 44 1 44
------ = -- * ------ * ------- = - * -- * - = --
EQpipe IC CCpipe CPIpipe 1 9 1 9
Back