Question 1

Question 1: Amdahl’s Law

SpeedupOverall        =        Exec.TimeOld       =                                          1                                .
                                        Exec.TimeNew                (1 - fractionenhanced ) + fractionenhanced
                                                                                                                speedupenhanced

SpeedupOverall = 1.7

speedupenhanced = x

1.7        =                                                   1                        .
                     (1 - fractionenhanced ) + fractionenhanced
                                                                x

1.7 = x .
x - x * fractionenhanced + fractionenhanced

1.7x - 1.7x * fractionenhanced + 1.7 * fractionenhanced = x

- 17x * fractionenhanced + 1.7 * fractionenhanced = x - 1.7x

- x * fractionenhanced + fractionenhanced = x - 1.7x
1.7

fractionenhanced = x - 1.7x .
1.7( 1 - x )

fractionenhanced =0.50

SpeedupOverall =1.25

speedupenhanced = x

1.25         =                            1        .
                                (1 - 0.5) + 0.50
                                                    x

1.25 = x .
0.50x + 0.50

x = 1.25(0.50) .
1 - 1.25(0.50)

x = 1.667

Question 2:CPI

Information Given:

Perfect Cache:

Type	Frequency	Number of Clock Cycles
ALU Operations	30%	1
Loads	30%	2
Stores	20%	2
Branches	20%	2

Imperfect Cache

Instruction Fetch Miss Rate = 5%

Load/Store Miss Rate = 90%

Miss Penalty = 40 clock cycles

(a) CPI for Each Instruction Type:

CPI = CPIPerfect + CPIStall

CPI = CPIPerfect+ (Miss Rate * Miss Penalty)

CPIALUops = 1 + (0.05* 40) = 3

CPILoads = 2 + [(0.05 + 0.90) * 40] = 40

CPIStores = 2 + [(0.05 + 0.90) * 40] = 40

CPIBranches = 2 + (0.05* 40) = 4

(b) CPI for the Overall Machine:

                                                    4
                    CPIOverall        =        å   CPI i* ICi / ICtotal
                                                 i =1

4 CPIOverall = åCPI i * (Relative Frequency) i =1
CPIOverall = (CPIALUops * 0.30) + (CPILaod * 0.30) + (CPIstore * 0.20) + (CPIbranch * 0.20)

CPIOverall = (3 * 0.30) + (40 * 0.30) + (40 * 0.20) + (4 * 0.20)

CPIOverall = 21.70

(c) Speed of Machine with NO Cache Misses 4 CPIperfect = å(CPIi perfect * Relative Frequency) i = 1
CPIperfect = 1 * 0.30 + 2 * 0.30 + 2 * 0.20 + 2 * 0.20

CPIperfect = 1.7

Speedupwith perfect cache = Exec.TimeOverall = IC * CPIOverall * CCT
Exec.Timeperfect IC * CPIperfect * CCT

Speedupwith perfect cache = IC * 21.7 * CCT = 12.76
IC * 1.7 * CCT Question 3:Tradeoffs

Information Given: A Tradeoff takes place between the Instruction Count (IC) and the Cycles per Instruction (CPI).
Decreasing the number of Instructions increase the CPI since 1-2 clock cycles are added per instruction.
CPIperfect = 1.7 from previous question

CPIALUops = 0.20(1) +0.15(2) + 0.10(1) + 1 = 1.6 cycles

CPILoad = 2 cycles

CPIStore = 2 cycles

CPIBranches = (2 +n) cycles

ICALUops= 0.30

ICLoad= 0.30 – 0.30(0.20)(1) – 0.30(0.15)(2) = 0.15

ICStores = 0.20 – 0.30(0.10)(1) = 0.17

ICBranches = 0.20

Must find n = max. tolerable increase in clock cycle count for the branches—yielding a design that performs better than the new one

CPU Perfect= CPUtradeoff

1.70*(ICTotal) = (CPIALU ops * ICALU ops) + (CPILoads*ICLoads) + (CPIStores*ICStores) + (CPIBranches*ICBranches)

1.70*(ICTotal) = [(0.30*1.6) + (0.15*2) + (0.17*2) + (0.20*{2 + n})][ ICTotal]

n = 0.9

Must determine x = max. tolerable percent increase in clock cycle time—yielding a design which performs better than the original design

CPU Exec Time Old = CPUExec Time New

1.7*(ICTotal Clock Cycle) = (CPIALU ops*ICALU ops) + (CPILoads*ICLoads) + (CPIStores*ICStores) + (CPIBranches*ICBranches)

1.7*(ICTotal) = [(0.30*1.6) + (0.15*2) + (0.17*2) + (0.2*2)][ ICTotal] ][1 +x]

x = 11.84%

Question 4: More Tradeoffs PART (a)

Ideal Machine:

CPUExecution Time = (CPUClock Cycles + Memory Stall Cycles) x Clock Cycle

= (IC x CPI + 0) x Clock Cycle

= IC x 2 x Clock Cycle

Memory Stall Cycles = IC x Memory References per Instruction x Miss Rate x Miss Penalty

Memory Stall Cycles = IC x (1 x 0.025 x 40 + 0.4 x 0.01 x 40)

= IC x 1.16

CPUExecution Time (Cache) = (CPUClock Cycles + Memory Stall Cycles) x Clock Cycle

= (IC x [2 + 1.16]) x Clock Cycle

= IC x 3.16 x Clock Cycle Performance Ratio:

Performance Ratio = CPUExecution Time (Cache) = 3.16 = 1.58
CPUExecution Time 2

PART (b)

Number of Non Loads/Stores: ICold x 0.6

Number of Loads/Stores: ICold x 0.4(1-0.3) = 0.28ICold

Total # of ICNew = 0.88 x ICOld

Memory Stall Cycles = Instruction Misses + Data Misses = [ICNew x Instruction Miss Rate x Instruction Miss Penalty] +

[Number of Loads/Stores x Data Miss Rate x Data Miss Penalty] = ICOld x0.88 X 0.025 x 40 + ICOld x 0.28 x 0.01 x 40

= 0.88 x ICold + 0.112 x ICold

= 0.992 x ICOld

CPUExecution Time = (0.88*2*ICOld + 0.992*ICOld) CCOld * 1.05

= ICOld * 2.8896 * CCOld

Speedup = CPUExecution Time Old = ICOld * 3.16 * CCOld = 1.094 CPUExecution Time New ICOld * 2.8896 * CCOld