Question 1: Amdahl’s Law

SpeedupOverall        =        Exec.TimeOld       =                                          1                                .
                                        Exec.TimeNew                (1 - fractionenhanced ) + fractionenhanced
                                                                                                                speedupenhanced
 
 

  1. SpeedupOverall = 1.7

  2.  

     

    speedupenhanced = x
     
     

    1.7        =                                                   1                        .
                         (1 - fractionenhanced ) + fractionenhanced
                                                                    x

    1.7        =                                                   x                            .
                        x - x * fractionenhanced + fractionenhanced
     

    1.7x - 1.7x * fractionenhanced + 1.7 * fractionenhanced = x
     

    - 17x * fractionenhanced + 1.7 * fractionenhanced = x - 1.7x
     
     

    - x * fractionenhanced + fractionenhanced         =         x - 1.7x
                                                                                    1.7

    fractionenhanced         =         x - 1.7x    .
                                            1.7( 1 - x )

    b.

fractionenhanced =0.50

SpeedupOverall =1.25

speedupenhanced = x
 
 

1.25         =                            1        .
                                (1 - 0.5) + 0.50
                                                    x

1.25         =                    x        .
                            0.50x + 0.50
 
 

x         =        1.25(0.50)        .
                      1 - 1.25(0.50)
 
 

x = 1.667
 
 

Question 2:CPI

Information Given:

Perfect Cache:

Type Frequency Number of Clock Cycles
ALU Operations 30%
1
Loads 30%
2
Stores 20%
2
Branches 20%
2

  Imperfect Cache

Instruction Fetch Miss Rate = 5%

Load/Store Miss Rate = 90%

Miss Penalty = 40 clock cycles

(a)      CPI for Each Instruction Type:

CPI = CPIPerfect + CPIStall

CPI = CPIPerfect+ (Miss Rate * Miss Penalty)
 
 

CPIALUops = 1 + (0.05* 40) = 3

CPILoads = 2 + [(0.05 + 0.90) * 40] = 40

CPIStores = 2 + [(0.05 + 0.90) * 40] = 40

CPIBranches = 2 + (0.05* 40) = 4
 
 

        (b) CPI for the Overall Machine:

                                                    4
                    CPIOverall        =        å   CPI i* ICi / ICtotal
                                                 i =1

 
          4
CPIOverall        =        åCPI i * (Relative Frequency)         i =1
 
CPIOverall         =         (CPIALUops * 0.30) + (CPILaod * 0.30) + (CPIstore * 0.20) + (CPIbranch * 0.20)

CPIOverall         =         (3 * 0.30) + (40 * 0.30) + (40 * 0.20) + (4 * 0.20)

CPIOverall         =         21.70
 
 

(c) Speed of Machine with NO Cache Misses      4 CPIperfect   =    å(CPIi perfect * Relative Frequency)     i = 1
 
CPIperfect  =     1 * 0.30 + 2 * 0.30 + 2 * 0.20 + 2 * 0.20

CPIperfect   =     1.7
 

Speedupwith perfect cache     =    Exec.TimeOverall   =     IC * CPIOverall * CCT
                                          Exec.Timeperfect           IC * CPIperfect * CCT

  Speedupwith perfect cache     =     IC * 21.7 * CCT     =     12.76
                                           IC * 1.7 * CCT  
Question 3:Tradeoffs
 
  Information Given: A Tradeoff takes place between the Instruction Count (IC) and the Cycles per Instruction (CPI).
Decreasing the number of Instructions increase the CPI since 1-2 clock cycles are added per instruction.
CPIperfect = 1.7 from previous question
 
 
 
CPIALUops = 0.20(1) +0.15(2) + 0.10(1) + 1 = 1.6 cycles

CPILoad = 2 cycles

CPIStore = 2 cycles

CPIBranches = (2 +n) cycles
 
 

ICALUops= 0.30

ICLoad= 0.30 – 0.30(0.20)(1) – 0.30(0.15)(2) = 0.15

ICStores = 0.20 – 0.30(0.10)(1) = 0.17

ICBranches = 0.20
 
 

    1. Must find n = max. tolerable increase in clock cycle count for the branches—yielding a design that performs better than the new one

    2.  

       

      CPU Perfect= CPUtradeoff
       
       

      1.70*(ICTotal) = (CPIALU ops * ICALU ops) + (CPILoads*ICLoads) + (CPIStores*ICStores) + (CPIBranches*ICBranches)

      1.70*(ICTotal) = [(0.30*1.6) + (0.15*2) + (0.17*2) + (0.20*{2 + n})][ ICTotal]

      n = 0.9
       
       

    3. Must determine x = max. tolerable percent increase in clock cycle time—yielding a design which performs better than the original design
CPU Exec Time Old = CPUExec Time New

1.7*(ICTotal Clock Cycle) = (CPIALU ops*ICALU ops) + (CPILoads*ICLoads) + (CPIStores*ICStores) + (CPIBranches*ICBranches)

1.7*(ICTotal) = [(0.30*1.6) + (0.15*2) + (0.17*2) + (0.2*2)][ ICTotal] ][1 +x]

x = 11.84%
 
 

Question 4: More Tradeoffs PART (a)

Ideal Machine:

CPUExecution Time = (CPUClock Cycles + Memory Stall Cycles) x Clock Cycle

                         = (IC x CPI + 0) x Clock Cycle

                         = IC x 2 x Clock Cycle

  Memory Stall Cycles = IC x Memory References per Instruction x Miss Rate x Miss Penalty

Memory Stall Cycles = IC x (1 x 0.025 x 40 + 0.4 x 0.01 x 40)

                                 = IC x 1.16

CPUExecution Time (Cache) = (CPUClock Cycles + Memory Stall Cycles) x Clock Cycle

                                 = (IC x [2 + 1.16]) x Clock Cycle

                                                     = IC x 3.16 x Clock Cycle Performance Ratio:

Performance Ratio     =     CPUExecution Time (Cache)     =     3.16    =     1.58
                                         CPUExecution Time                            2

  PART (b)

Number of Non Loads/Stores: ICold x 0.6

Number of Loads/Stores: ICold x 0.4(1-0.3) = 0.28ICold

Total # of ICNew = 0.88 x ICOld

Memory Stall Cycles = Instruction Misses + Data Misses = [ICNew x Instruction Miss Rate x Instruction Miss Penalty] +

[Number of Loads/Stores x Data Miss Rate x Data Miss Penalty] = ICOld x0.88 X 0.025 x 40 + ICOld x 0.28 x 0.01 x 40

= 0.88 x ICold + 0.112 x ICold

= 0.992 x ICOld
 
 

CPUExecution Time = (0.88*2*ICOld + 0.992*ICOld) CCOld * 1.05

                         = ICOld * 2.8896 * CCOld

  Speedup     =    CPUExecution Time Old     =     ICOld *   3.16   * CCOld            =     1.094       CPUExecution Time New              ICOld * 2.8896 * CCOld