A better solution, and the one used in modern processors, is to require that instructions be completed in order, in other words, instructions are issued in-order, executed out-of-order, then committed to the register file in-order. Thus, to anything outside of it, the microprocessor looks like it is executing instructions in-order; the first instruction that comes in is the first that is completed.
In-order completion is accomplished by placing a reorder buffer right before writing to the registers. The reorder buffer is an ordered buffer which holds the results of instructions waiting to be committed. The buffer is ordered in program order, thus the top of the buffer is the first instruction that should be committed. If an instruction completes, but earlier instructions are still executing, the result is stored in the reorder buffer until all earlier instructions have been committed.
When used in conjunction with dynamic branch prediction, the reorder buffer can be used to perform speculative execution. Going back to the example with the branch, if a prediction if found to be in error, the reorder buffer can be flushed starting at the branch since no instructions past the branch have permanently changed the state of the machine. In essence, with the reorder buffer we can now do loop unrolling completely in hardware. Note that the reorder buffer also allows for precise exception handling since instructions are completed in order. Thus, if an exception occurs, a flag is set, and when the instruction goes to commit, the flag is detected and the exception is handled.
Many of today's processors implement some form of out-of-order execution mechanism. The IBM PowerPC 604, the MIPS R10k, and the HP PA-8000 all include an out-of-order execution unit with in-order completion. Even Intel's P6 line (Pentium Pro, Pentium II, Celeron) implements out-of-order execution.