Netburst Architecture

Exe. Time = IC * CPI * CCT

There is not much hardware can do about the Instruction counts. That depends on the compilers. Intel's Pentium 4 processor uses several different techniques to improve CPI and CCT.

Intel calls the new architecture of Pentium 4 'NetBurst'. There are serveral main parts of the NetBurst Architecture which are listed below:

 The following is a diagram of Pentium 4 from tomshardware.com. This Diagram will help you understand the architecture of Pentium 4.

Hyper Pipeline

The hyper pipeline is a new 20-stage branch prediction/recovery pipeline. The pipeline of P6 micro-architecture has 10 stages as well as the Atholon has 11 stages. The reason for the longer pipeline is Intel's wish of Pentium 4 to deliver highest clock rates. Once a pipeline has more stages, it allows for the CPU to do less per clock cycle and increase the speed of the clock rate and allow for more processing headroom. If more stages can increase the speed of clock rate, why not make it 50 stages instead of 20. The reason is that As soon as it turns out at the end of the pipeline that the software will branch to an address that was not predicted, the whole pipeline needs to be flushed and refilled. The longer the pipeline the more 'in-flight' instructions will be lost and the longer it takes until the pipeline is filled again.

Execution Trace Cache

Due to the increase of the speed of?the clock rate, the design of the cache has to be improved to let Pentium 4 to have the most ideal performance. One special thing about the Pentium 4¡s L1 cache is that the size has been reduced to 8kb, which is half the size of Pentium III's L1 cache and only an eighth of Athlon's. The reason for a small cache is to enable its extremely low latency of only 2 clock cycles. This latency is less than half of Pentium III's L1 cache.

While the L1 cache of Pentium 4 uses 4-way set associative, the L2 cache uses 8-way set associative. The L2 cache, also called advanced transfer cache, is 256KB in size(same as Pentium III's L2 cache) delivers a much higher data throughput channel between the Level 2 cache and the processor core. The Advanced Transfer Cache consists of a 256- bit (32-byte) interface that transfers data on each core clock. As a result, a 1.4-GHz Pentium 4 processor can deliver a data transfer rate of 44.8GB/s. This rate is almost 3 times as fast as the transfer rate of Pentium III at 1 GHz.

Advanced Dynamic Execution

The Pentium 4 processor has an extremely efficient out-of-order speculative execution engine that keeps the execution units busy. Also new is an enhanced branch prediction capability that keeps the processor executing to the correct program flow and reduces the mis-prediction penalty associated with deeper pipelines.

Rapid Execution Engine

The Pentium 4's two ALUs (Arithemetic Logic Units) run at twice the frequency of the processor core, so effectively 3Ghz for a 1.5Ghz Pentium 4. This allows for two things. It allows the processor to execute certain instructions at 1/2 the frequency of the processor core and allows for higher execution speeds and reduced latency. It allows for the processor to re-calculate bad cycles in a timely fashion that won't allow it to lose execution time. If it miscalculates during a cycle it still has the second half of a cycle to retrace its steps and correct the error.

However, this doesn't help in integer based applications, such as business applications. Even with all these new fancy features to help out with miscalculated cycles and keep the processing flowing a pace that is higher than that of the Athlon. When dealing with integer applications they generally can't be predicted, meaning the cycles on the Pentium 4 will have to start over often. Even with a Rapid Execution Engine to help speed up the processing time of instructions the Pentium 4 has to pay very high penalties for a misprediction on any one cycle. Like everything else out there nothing is perfect and you must pay the price for success, in the Pentium 4's case a longer pipeline to achieve a higher clock speed.

Intel I 850 chipset

As with any new processor, you must have a new chipset to go along with it. The Pentium 4 features a 400 Mhz NetBurst system bus, which provides nearly three times the system bandwidth over other platforms. With 3.2GB/s of system bandwidth, compared to 1.06GB/s of bandwidth from the Pentium III running at 133Mhz System bus. This gives the processor tons of processing headroom for complex applications, such as e-commerce applications which demand bandwidth for highly complex applications.

Furthermore, the 850 chipset features enhanced I/O Controller Hub (ICH2) features such as an additional USB controller for four ports and twice as much bandwidth for USB than any other bridge architecture; 24Mbps. Also, the 850 chipset features ATA100 support for the best performance in hard drives and allows for a cost-effective solution for the latest features of HDD technology without losing performance from cost-effective HDD solutions.


Home | | Introduction ||Architecture | | Benchmark | | Disadvantage | | Future Enhancement | | Glossary | | FAQ| | Reference 1