Set Associative Cache
Introduction
A set-associative scheme is a hybrid between a fully associative
cache, and direct mapped cache. It's considered a reasonable compromise
between the complex hardware needed for fully associative caches (which
requires parallel searches of all slots), and the simplistic direct-mapped
scheme, which may cause collisions of addresses to the same slot (similar
to collisions in a hash table).
Let's assume, as we did for fully associate caches that we have:
- 128 slots
- 32 bytes per slot
Furthermore, let's assume that we can group slots together into
sets. In particular, we will assume that we have 8 slots per set.
Parking Lot Analogy
Suppose we have 1000 parking spots. This time, instead of
using a 3 digit number for each parking spot, we use 2 digits.
Thus, the parking spots are numbered 00 up to 99.
However, instead of one parking spot per number, we have
10 for each number. Thus, there are ten parking spots numbered
00, ten numbered 01, ..., and ten numbered 99.
Your parking spot is based on the first 2 digits of your student
ID number.
In this case, you use the first 2 digits of your student ID,
and have up to 10 different parking spots you can park at. This
gives you some flexibility about where to park.
In effect, the various parking permits on a large commuter campus
work just like that. There are many lots, each with their own letter
or number. You are given a permit for a particular lot, but you can
park anywhere within this lot. The advantage is that you only have to
search for a spot in one large lot, as opposed to searching for a
parking spot in all of campus.
Set Associative Scheme
Like the direct mapped scheme, we still treat the slots like
an array. The slots are still numbered 0000000 up to 1111111 (there
are 128 slots).
However, we group the slots into sets, and the key is to
keep track of the sets, instead of the slots.
How many sets do we have? 128 slots divided by 8 slots per
sets, gives us 16 sets.
We need to specify the set number, instead of the slot number,
and that takes lg 16 = 4 bits.
Here's how the bits of the address break down. It's very
similar to direct mapped, except we use 4 bits for the set, instead
of the slot.
Bits A_{4-0} is still the offset. The set
number are the next 4 bits, Bits A_{8-5}. The
remaining bits, A_{31-9} is the tag.
Finding the Slot
Finding a slot is more complex than in direct-mapped caches.
Suppose you have address B_{31-0}.
- Use bits B_{8-5} to find the set.
- This should specify 8 slots (since we said there were 8 slots
per set. The slots should have following slot indexes:
- B_{8-5}000
- B_{8-5}001
- B_{8-5}010
- B_{8-5}011
- B_{8-5}100
- B_{8-5}101
- B_{8-5}110
- B_{8-5}111
In effect, the set number specifies the upper 4 bits of the
index, and the bottom 3 bits are all possible 3 bit bitstring
values.
- Search in all 8 slots to see if the tag A_{31-9}
matches the tag in the slot.
- If it matches one of the slots, get the byte at
offset B_{4-0}.
- If not, decide which slot should be used (possibly evicting a slot),
fetch the 32 bytes from memory, slot, updating valid bit,
dirty bit, and tag as neededx
This is called 8-way set associative cache, since each set contains
8 slots. You can have N-way set-associative caches, where each
set contains N slots (where N is a power of 2).
Compromises
This scheme is a compromise. You only have to use the complex
comparison hardware (to find the correct slot) on a small set of
slots, instead of over all the slots. Presumably, such comparison
hardware is more than linear in the number of slots, so the
fewer the slots you need to search through, the less overall
hardware is needed.
Yet, you gain the flexibility of allowing up to N cache lines
per slot for an N-way set associative scheme.
Summary
A set-associative cache scheme is a combination of
fully associative and direct mapped schemes. You group slots
into sets. You find the appropriate set for a given address (which
is like the direct mapped scheme),
and within the set you find the appropriate slot (which is like
the fully associative scheme).
This scheme has fewer collisions because you have more slots to
pick from, even when cache lines map to the same set.