Subsections

Why the hell do I need to know this?

Because Meesh loves to put it on exams - and it's a very simple technique used for lowering CPI when branches are involved.

How are cycles wasted?

In section 3.5 of the book, they go into glorious dense detail about the joys of control hazards. Basically, when you encounter some sort of conditional branch instruction, there is always a one cycle delay between the issue of the branch instruction and the branch test - assumed here in the execute phase. This branch delay slot will be filled with a NOP by a dumb compiler, the next instruction after the branch (assume branch-not-taken) by a smarter compiler, and ANYTHING by the smartest compilers. Not just anything, mind you. This branch delay slot should contain a ``friendly'' non-destructive short (one-cycle) instruction such as a load or store. But there's nothing wrong with filling it with something destructive: it's the compiler's job to make sure that the results of the bds doesn't affect the following code.

Okay, but that's only one cycle. So?

The other ``wasted cycles'' come from loop overhead: bookkeeping code like array index registers or counters is executed once for each loop execution. This means tight loops are a no-no: for one loop execution, you may have three total instructions to execute per loop - 2/3 of the loop is overhead, 1/3 is actual work. However, if you do more loop iterations before branching, you can reduce this overhead to a smaller percentage - say we iterate 5 instructions at a time. This increaces the work percentage to 5/8, and the overhead to 3/8.

Oh no, not that Amdahl guy again!

Loop unrolling reduces the proportions of overhead to work in the loop. Really what we are after is a decrease in CPI. Plug that in your CPU execution equation, and you can find the speedup.

But we will still have stalls!

There's only one hope left: after we unroll the loop, we can play God with our code, and reschedule it - move the instructions around to minimize (hopefully eliminate) stalls. This is almost arbitrary: you need to see what kinds of instructions you have to play with, how long they take, how many instructions you *can* move around, and how many other instructions you can stuff between multicycle operations. Yes, the EVIL multicycle pipeline will rear its ugly head, once again.

Previous: Contents Home: Contents Next: Oh crap, more numbers...