the HELL OF CACHES

improving the performance of caches

Questions:

1. What are some ways to reduce cache misses?

2. What are some ways to reduce the penalty when we miss?

3. What are some ways to reduce the time required to check for a cache hit?




1. What are some ways to reduce cache misses?

methodpurposepitfall
larger block sizeblocks contain more surrounding data, more hits per block for code with high localityblock replacement consumes more memory bandwidth, too large a block size may increases misses by reducing number of blocks in a given cache size
higher associativityreduces conflict discardsmay increase hit check time
victim cachestiny caches which cache only discards, they help minimize the impack of conflict-induced discardsbenefit primarily for direct mapped caches, with many misses victim caches cannot help
psuedo-associtivityattempts to simulate set-associativity with direct mapped cache performancecomplex and slower in implementation
hardware prefetchcache grabs extra blocks on a miss, under the assumption that subsequent blocks will be referenced soon, thereby avoiding a miss for each pre-fetched blockincreases memory bandwidth consumed when servicing a miss, may interfere with demand (miss) fetches, lowering actual performance
software prefetchcompiler can instruct CPU to preload data into a register or touch a memory block (load into cache). two varieties -- faulting and nonfaulting -- which either cause virtual memory page faults or are no-ops if a page fault occurs (no point in pre-fetching through an expensive VM page fault)adds complexity to compiler implementation, also compilers cannot do perfect prefetching. ties binary more closely to the underlying hardware implementation
compiler optimizationless code = fewer instruction misses, better data memory organization = fewer data (and conflict-induced) misses, etc.obvious compiler complexity, but portable and beneficial for more reasons than just cache interaction




2. What are some ways to reduce the penalty when we miss?

methodpurposepitfall
Priority Reads service reads before servicing writes under the theory that reads are needed to resume processingimplemented by write buffers or similar techniques (see previous section)
Sub-block placement breaks blocks up into smaller virtual blocks, reducing the overhead when only a portion of a block changesmay require search of block for locations and increase hit check time
Early restart when fetching, as soon as the required word of a missed block arrives, resume processinglow benefit for small block sizes since the overhead of transfer is relatively small compared to the overhead for a single word, due to memory access delays and setup time
Critical word first loads the required word before loading the blocksame as early restart
Deeper cache heirarchy adds another layer of cache below the current level, making this cache's apparent access to memory fastercomplicated performance analysis, adds to hardware complexity and expense, not really worth it for small-size main memories
Non-blocking caches services cache-hits while handling memory transfers in background so that cache misses do not block CPUadds a great deal of complexity since the cache management must handle multiple outstanding memory requests while still servicing hits




3. What are some ways to reduce the time required to check for a cache hit?

methodpurposepitfall
small, simple cache designsimple caches have easy hit detection (direct mapped or low associativity) due to reduced possibilities for block location, small caches reduce the possible number of blocksobvious performance tradeoffs in terms of cache size and conflict-induced misses
virtually addressed caches??
pipelined cache hardwareparallelize tag searches on associative caches, reducing the time necessary to search for a blockcomplicates tag memory design




back to the table of contents




body text Copyright 1997 Robert Rodgers graphics images Copyright 1997 Robert Rodgers HTML code copyright 1997 Elizabeth Rodgers and Robert Rodgers. The tables in section IV were derived from several graphs and tables in Hennessey and Patterson's COMPUTER ARCHITECTURE: A QUANTITATIVE APPROACH. Permission is granted to provide these pages in their original, unaltered form online for educational purposes. They may not be republished or included on a CDROM, disk or other media or with any published work without the express permission of the copyright holders (this includes FAQ books).