Fully Associative Cache

Introduction

A cache consists of a number of slots. Let's call it N. N is almost always some power of 2. Each slot contains M bytes which is the cache line, and this is usually a power of 2, as well.

We'll assume, for our example, that our cache consists of 27 = 128 slots.

Each slot will contain 25 = 32 bytes in the cache line.

Parking Lot Analogy

Assume you have a parking lot where they have handed out many parking permits. In fact, there's more parking permits than parking spots. This is not uncommon at a college. When a lot fills up, the students park in an overflow lot.

Suppose there's 1000 parking spots, but 5000 students. With a fully associative scheme, a student can park in any of the 1000 parking spots.

The advantage of such a scheme is that it makes full use of the parking lot.

Fully Associative Scheme

Suppose we are trying to access a byte at address A31-0. We know that the address can be split into two parts: A31-5, which is the tag, and A4-0, which is the offset.

We generate 32 addresses by keeping the upper 27 bits, and making all 32 5-bit bitstrings for B4-0.

Which slot should the cache line go to, assuming the data is not already in the cache? In a fully associative scheme, you can pick any slot. However, there are some intelligent choices you can make.

Finding the Slot

How would you determine whether the cache line you are looking for is in the cache? Assume we're trying to load or store at address B31-0.

Since the cache line can be in any slot, we will have to look at every slot. Hardware has one advantage over software when it comes to searching the slots. You can do searches in parallel, instead of examining each slot one at a time.

To do the search, first get the tag bits out of the address: these are bits B31-5. You must simultaneously compare the address tag bits (which is B31-5) to the tag of each slot. This can be done using a comparator, which can simply be a bunch of XNOR gates (XNOR2 is true when two bits have the same value), which compare the address tag to the slot tag.

You must also determine whether the slot's valid bit is 1. For a match to occur, the address tag must match the slot tag, and V = 1.

At most, one slot should match. If there is more than one slot that matches, then you have a faulty fully-associative cache scheme. You should never have more than one copy of the cache line in any slot of a fully-associative cache. It's hard to maintain multiple copies, and doesn't make sense. The slots could be used for other cache lines.

The hardware for finding the right slot, then picking the slot if more than one choice is available is rather large, so fully associative caches are not used in practice. The complexity of the fully associative hardware actually slows down the overall speed of the cache.

Getting the Data

Once you find a valid slot whose tag bits match, you use bits B4-0 to access the element in the cache line (as if it were a 32 element array). Bits B4-0 are the offset.

Deciding How Many Bits for Tag and Offset

In general, if you have M bytes of data in the cache line, and M is a power of 2, then this means lg M bits are used for the offset. Thus, A(lg M - 1)--0 are the offset bits. Thus, there are 32 - lg M bits for the tag. These are bits A31 -- lg M.

Summary

In a fully associative scheme, any slot can store the cache line. The hardware for finding whether the desired data is in the cache requires comparing the tag bits of the address to the tag bits of every slot (in parallel), and making sure the valid bit is set.

If there is a cache miss, the initial goal is to pick a slot that's not valid to place the data being searched for. If there are no valid slots, the hardware must pick a slot to evict based on some eviction policy. If the evicted slot's dirty bit is 1, then the data in the slot must be copied back to RAM. If it's 0, there's no need to copy back, since the data in the cache line should be identical to the data in the corresponding memory locations in RAM.

Then, the new data block must be copied from RAM to the slot, and V set to 1, D set to 0, and the slot tag copied from the address tag.

The hardware for a fully-associative cache can be rather complex, which is why you don't see fully-associative caches (except for translation lookaside buffers).