SYSTEM BUS            BUS PROTOCO                SUMMARY



 
 

 A Bus Analogy

Imagine that you are a guest at a hotel whose valet parking service operates like a Pentium processor bus. You make READ requests: "Please fetch my car" and WRITE requests: "Please park my car." When you are the only guest staying at the hotel (single processor making sequential requests) then this is an efficient way to run valet parking. But when more people stay at the hotel (multiprocessing) then the single attendant fetching and parking cars will cause many of the guests to wait. The situation is even worse if the guests can make multiple requests, as a Pentium Pro processor with Dynamic Execution will.

Using a Pentium Pro processor bus is like running this valet parking with 8 attendants all of whom can be servicing multiple, overlapped requests to fetch/park cars and guests will only have to wait a long time if more than 8 requests are made. A protocol must be defined so that cars do not collide at the single door into and out of the garage (simultaneous write and read are not permitted). Note too that the cars don't move any faster in the Pentium Pro processor garage, when compared with the Pentium processor garage, but they do move more efficiently. Imagine five cars being fetched and arriving bumper-to-bumper rather than five minutes between each one!

When one or more processors are capable of making multiple requests a bus that is capable of servicing multiple requests is needed.
 
 
TOP

The Bus Protocol

The approximately 150 signal pins on the Pentium Pro processor are grouped broadly into Request and Response. An agent must own the request bus before initiating a request; this is accomplished using the arbitration bus. A request is issued across two adjacent clocks; the first contains the address, memory type, etc., and the second contains a unique transaction ID, request length, byte enables etc. Three clocks after the request is issued, error status is sampled to guard against transmission errors or protocol violations. An error will cause the Pentium Pro processor to retry this request and a second error on the same request will cause the Pentium Pro processor to take a machine check exception.

Pentium Pro Processor Transactiongraphic
Bus transactions are divided into multiple overlapping phases

All agents decide if they should respond and, if necessary, drive completion codes onto the bus during the completion phase. Other Pentium Pro processors (and a cluster bridge if present) will drive HIT# and HITM# during this phase so that the owner of the response can be determined. and HITM# during this phase so that the owner of the response can be determined. Note that an agent that cannot respond within the assigned four clocks should drive HIT# and HITM# to stretch the completion phase in increments of two clocks. Cache hits will allow a Pentium Pro processor to respond in preference to memory; cache-to-cache transfers are faster then memory transfers but memory must snarf the cache data on a HITM#.$

Referring to the chart above, multiple requests and responses can be seen on the bus at the same time. A logic analyzer looking at, say, clock 14 will observe data from response 2, completion codes for response 3, request 4 started and request 5 arbitrating for use of the bus. Intel is working with logic analyzer vendors so that their tools will contain software that displays this array of 1's and 0's as discrete, time-coded transactions.
 

TOP
 
 

Pentium(R) Pro Processor System Bus

[Pentium Pro Processor System Bus]

Let's take our attention to a higher level now. Where does this design fit in a systems environment? This diagram looks essentially like a standard block diagram of a computer system except for the L2, which is in the same package as the CPU. The first thing to note is that when we got to this point in designing the Pentium Pro processor, we saw how to design a fast CPU, but if we left the rest of the machine alone we would have taken a relatively balanced system and unbalanced a piece of it.

The problem with an unbalanced system is you can't predict the performance. And generally it has unpleasant surprises in other ways. So we set out to explore what other things needed attention. That resulted in the following bus:

The Pentium Pro processor uses a 64-bit data bus. It has 36 bits of physical addressing. It is transaction based which means that any access that is looking for data gets on the bus with the request and gets off the bus until the data is coming back. In the meantime, other agents on the bus can use that bandwidth. It runs at 1/2, 1/3, or 1/4 of the CPU clock speed and it has snooping support built in for multiprocessing.
 

TOP

Summary

The Pentium Pro processor will initially be used for today's most demanding applications where its 32-bit software performance is most needed -- from professional desktops (workstations and performance desktops) to enterprise application servers. Still, the Pentium Pro processor bus was designed to ultimately support the full range of systems. This bus enables new classes of scalable systems design which will accelerate the "innovation spiral". The result is new advantages for system buyers -- from more cost-effective dual-processing in professional desktops, to improved scalability in servers.

Pentium Pro Processor Desktop Graphic

The Pentium Pro processor enables new classes of scalable designs
 
TOP