ANSWER 1: 1.1] Compute the CPI of one iteration of the following code fragment. Assume a normal DLX with no optimizations. The pipeline has Floating Point ADD stage in the Execution stage. Code for DLX, not scheduled for the pipeline: (from Hennessey and Patterson) Loop: 1> LD F0, 0(R1) ; F0=array element 2> ADDD F4, F0, F2 ; add scalar in F2 3> SD 0(R1), F4 ; store result 4> SUBI R1, R1, #8 ; decrement pointer 8 bytes per (DW) 5> BNEZ R1, Loop ; iterate if R1 not zero. end. ANSWER: 5 instructions, takes 20 clock cycles. CPI = 20/5 = 4. 1.2] Now compute the CPI of the unrolled code and decide if it is more efficient than the old code. The unrolled loop after it has been rescheduled: (Assume we will unroll it 4 times) Loop: LD F0, 0(R1) LD F6, -8(R1) LD F10, -16(R1) LD F14, -24(R1) ADDD F4, F0, F2 ADDD F8, F6, F2 ADDD F12, F10, F2 ADDD F16, F14, F2 SD 0(R1), F4 SD -8(R1), F8 SD -16(R1), F12 SUBI R1, R1, 32 BNEZ R1, Loop SD 8(R1), F16 ; 8-32 = -24 Branch Delay Slot end. ANSWER: 14 instructions, take 21 clock cycles. CPI= 21/14 = 1.5 This CPI is much better than the unoptimized code CPI. It is closer to the Ideal CPI of 1.