Hardware Implementation of AGP

The most critical aspect of 3D graphics is the processing of texture maps, the bitmaps which describe in detail the surfaces of three-dimensional objects. Texture map processing consists of fetching one, two, four, or eight texels (texture elements) from a bitmap, averaging them together based on some mathematical approximation of the location in the bitmap (or multiple bitmaps) needed on the final image, and then writing the resulting pixel   to the frame buffer.

In pre-AGP PCI systems, there are five basic steps involved in processing textures:

1. Prior to their usage, texture maps are read from the hard drive and loaded into system memory. The data travels via the IDE bus and chipset before being loaded into memory.
2. When a texture map must be used for a scene, it is read from system memory into the processor. The processor performs point-of-view transformations upon the texture map then caches the results.
3. Lighting and viewpoint transforms are then applied to the cached data. The results of this operation are subsequently written back to system memory.
4. The graphics controller then reads the transformed textures from system memory and writes them in its local video memory (also called graphics controller memory, the frame buffer, or off-screen RAM). In present-day systems, this data must travel to the graphics controller over the PCI bus.
5. The graphics controller next reads the textures plus 2D color information from its frame buffer. This data is used to render a frame which can be displayed on the 2-D monitor screen. The result is written back into the frame buffer. The system's digital-to-analog convertor will read the frame and convert it to an analog signal that drives the display.

The reader may notice a number of problems with the way texture maps are currently handled. First, textures must be stored in both system memory and the frame buffer; redundant copies are an inefficient use of memory resources. Second, storing the textures in the frame buffer, even temporarily, places a ceiling on the size of the textures. There is a demand for textures with greater and greater detail, pressuring hardware manufacturers to put more frame buffer in their systems. However, this type of memory is quite expensive, thus this is not an optimal solution. Finally, the 132Mbyte/s bandwidth of the PCI bus limits the rate at which texture maps can be transferred to the graphics subsystem. Furthermore, in typical systems several I/O devices on the PCI bus must share the available bandwidth. The introduction of other high-speed devices, such as Ultra DMA disk drives and 100 MByte/s LAN cards makes the congestion even worse. It is easy to see how congestion on the PCI bus can limit 3D graphics performance on a PC.

AGP relieves the graphics bottleneck by adding a new dedicated high-speed bus directly between the chipset and the graphics controller. This removes bandwidth-intensive 3D and video traffic from the constraints of the PCI bus. In addition, AGP allows textures to be accessed directly from system memory during rendering rather than being pre-fetched to local graphics memory. Segments of system memory can be dynamically reserved by the OS for use by the graphics controller; this memory is termed AGP memory or non-local video memory. The net result is that the graphics controller is required to keep fewer texture maps in local memory. Smaller local memory requirements mean lower overall system cost. This innovation also eliminates the size constraint that localgraphics memory places on texture maps, thus enabling applications to use much larger texture maps and further improving realism and image quality. As a final point, it should be noted that off-loading graphics and video data from the PCI bus makes more room available for bandwidth-hungry high-speed devices.

AGP is implemented with a connector similar to that used for PCI, with 32 lines for multiplexed address and data. There are an additional 8 lines for sideband addressing.

Local video memory is usually more expensive than generalized system memory and it cannot be used for other purposes by the OS when unneeded by the graphics of the running applications. The graphics controller needs fast access to local video memory for screen refresh, Z-buffers, and pixels (front and back-buffers). For these reasons, programmers can always expect to have more texture memory available via AGP system memory. Keeping textures out of the frame buffer allows larger screen resolution, or permits Z-buffering for a given large screen size. Most applications could use 2-16 MB for texture storage. By using AGP, they can get it.

While the PCI bus supports a maximum of 132 MBytes/s, AGP at 66 MHz runs at 533 MBytes/s peak. It gets this speed increase by transferring data on both the rising and falling edges of the 66 MHz clock and through the use of data transfer modes that are more efficient. (Actual throughput will vary among various systems and applications, but usually they obtain about 50-80% of peak values in sustainable real-world transfers.)

AGP provides two modes for the graphics controller to directly access texture maps in system memory: pipelining and sideband addressing. In pipelining, AGP overlaps the memory or bus access times for a request ("n") with the issuing of following requests ("n+1"..."n+2"... etc.). In the PCI bus, request "n+1" does not begin until the data transfer of request "n" finishes. While both AGP and PCI can "burst" (transfer multiple data items continuously in response to a single request), such bursting only partly alleviates the non-pipelined nature of PCI. The depth of AGP pipelining depends on the implementation, and remains transparent to application software.

With sideband addressing, AGP utilizes 8 extra "sideband" address lines which allow the graphics controller to issue new addresses and requests simultaneously while data continues to move from previous requests on the main 32 data/address wires.

So called AGP memory is just dynamically-allocated areas of system memory, which the graphics controller can access quickly. The access speed comes from built-in hardware in the 440LX chipset which translates addresses, allowing the graphics controller and its software to see a contiguous space in main memory, when in fact the pages are disjointed. Thus the graphics controller can access large data structures like texture bitmaps (typically 1 KByte to 128 KByte) as a single entity. The built-in chipset hardware is called the GART (Graphics Address Remapping Table), similar in function to the paging hardware in the CPU.

The processor "linear" virtual addresses are translated by its paging hardware into physical addresses. These physical addresses are used to access system memory, the local frame buffer, and AGP memory. The CPU accesses to the local frame buffer and AGP memory use the same addresses as the graphics controller does. The operating system therefore sets up the CPU paging hardware to a straight 1:1 non-translation of virtual to physical address. For accesses to AGP memory, the graphics controller and CPU use a contiguous aperture of several megabytes. But the GART translates these to various, possibly disjointed, 4 KByte page addresses in system memory. PCI devices that access to the AGP memory aperture (for example, for live video capture) also go through the GART.


Copyright © 1998, Hank Kuo and Adam Labelson.