In the ever-changing microprocessor industry, Transmeta has truly been innovative by creating a chip that utilizes both software and hardware. The engineers at Transmeta have devised a method of surrounding a hardware engine with a logical software layer. This software hardware blend combines robust performance and minimal power consumption. The hardware engine is a VLIW (Very Long Instruction Word) CPU, while the software layer is called Code Morphing. The Code Morphing software dynamically translates x86 instruction into VLIW instructions. To maintain integrity every x86 instruction must pass through the Code Morphing layer before it reaches the hardware engine.
In essence the Code Morphing software compiles x86 instructions into VLIW instructions on the fly, which are then executed by the CPU. It is the software layer’s responsibility to pack multiple x86 instruction into a molecule (VLIW Instruction). This translation does not come without a cost though. Precious clock cycles must be allocated to the Code Morphing software. Although the cost of these clock cycles can be offset by implementing a software layer that has many advantages over having strictly hardware.
The most beneficial advantage over traditional chips is that the Crusoe processors don’t require as many transistors since they are replaced with a software layer. This in turn lowers the cost of the Crusoe chips because they don’t have to be as large.
The Code Morphing software is also smart. It recognizes which parts of the code are being used the most. Much like the Splay Tree in CMSC420, the code Morphing software amortizes the cost of translation over many executions. It then optimizes those blocks of code, so each time through it is more efficient. It also keeps tracks of the branches in a similar fashion.
The VLIW engine is comprised of one floating-point unit, two integer arithmetic logic units, one memory (load / store) unit and one branch unit. Crusoe calls each one of their VLIW instructions molecules. Each molecule is comprised of atoms (32 bit RISC instruction) and can be either 64 bits or 128 bits long.
By compacting atoms into molecules as many as four instructions can be completed in one clock cycle. All molecules are executed in order and all atoms within a molecule are executed in parallel. The strength of this engine lies in the fact the software layer packs each molecule as full as possible for maximum efficiency. This only happens because the software can translate entire groups of instructions rather than one by one.
The software layer was created in order to move translation, branch prediction and out of order execution form hardware to software. As stated above the additional Code Morphing layer adds many benefits to the translation. The translator, which resides in the software, recompiles x86 instructions into optimized VLIW instructions.
The Translator utilizes “locality of reference”, where it remembers the translations that it has completed and stores them in the Translation cache for easy access. So as more translations occur, the software learns about the program and schedules VLIW instructions more efficiently. Since 90% of execution time is spent in 10% this really pays dividends in terms of clock cycles saved. If the translator runs into an instruction that has already been translated it will skip the translation process and just use the on in the Translation cache.
The Translator also collects information on block execution frequencies and branch history. This comes in very handy in branch prediction. Based on the previous histories of the branch the Code Morphing Layer can make an educated prediction on whether the branch will be taken or not. It can even “decide to speculatively execute code from both paths and select the correct result later.” The flexibility of the software is amazing. Where strictly hardware has so many limitations the software can implement any algorithm that you can imagine.
The Scheduler, which also resides in the software, reorders the atoms and packs them into molecules. Each atom within a molecule must be independent of the other atoms. In other word they cannot have dependences for one another. This technique supports out of order execution in order to be efficient. Essentially the Code Morphing layer orders atoms how it sees fit using complicated algorithms while checking for dependences. Much like Tomalsulo’s algorithm the scheduler aggressively renames registers to avoid stalls from WAR and WAW hazards.