More Sample Problems

  • Use register renaming to eliminate all but true dependencies the following loop:

    1. Loop: LD F0,0(R1)
    2. ADDD F4,F0,F2
    3. MULTD F2,F6,F8
    4. ADDD F4,F10,F12
    5. SUBI R1,R1,8
    6. ADDD F8,F4,F0

  • We are going to consider prediction schemes related to the loop S2. Assume that all prediction schemes are initialized to taken.
    1. S1 for(i=0;i<2;i++)
    2. S2 for(j=0;j<4;j++)
    3. {
    4. a += b(i)*c(j);
    5. }

    a) write assembly pseudocode corresponding to this loop

    b) assume that we have a one bit prediction scheme - for which iterations do we mispredict S2?

    c) assume that we have a two bit prediction scheme - for which iterations do we mispredict S2?

    3) Consider the following loop:

    for(i=0; i < N ;i++) {

    S1: for(j=0;j<8;j++)

    {

    x(8*j) = x(8*j) + y(4*j);

    } }

    Assume that each array element is 4 bytes and that memory addresses are interleaved between banks at the word level (e.g bank 0 gets bytes 0-3, bank 1 gets bytes 4-7, etc).

    a) Assume that we have 8 independently accessible memory banks. Each bank can produce a datum every 50ns. What is a lower bound on the amount of time required to carry out an iteration of loop S1?

    b) How could you rewrite the code to improve this situation? What is your new lower time bound?

    4) Assume single instruction issue, but assume that there is a separate floating point add unit which has two stages (A1,A2), a floating point multiply unit with stages (M1,M2,M3), and a pipelined cache which takes 3 cycles to get data from cache but can produce data every cycle - pipelined cache stages (C1,C2,C3).

    1. LD F0,0(R2)
    2. MULTD F6,F4,F2
    3. LD F8,0(R3)
    4. ADDD F10,F8,F0
    5. ADDD F12,F6,F0
    6. MULTD F14,F6,F2
    7. LD F8,0(R4)
    8. ADDD F16,F8,F0

    a) Write a pipeline diagram for this set of instructions

    b) Schedule (reorder) the instructions to improve performance (you can also rename registers) and produce a pipeline diagram