Example Snooping Protocol

- Snooping coherence protocol is usually implemented by incorporating a finite-state controller in each node (cache).
- Logically, think of a separate controller associated with each cache block.
  - That is, snooping operations or cache requests for different blocks can proceed independently.
- In real implementations, a single controller allows multiple operations to distinct blocks to proceed in interleaved fashion.
  - Meaning one operation may be initiated before another is completed, even though only one cache access or one bus access is allowed at a time.

Write-through Invalidate Protocol

- 2 states per block in each cache
  - As in uniprocessor
  - State of a block is a p-vector of states.
  - Hardware state bits associated with blocks that are in the cache.
  - Other blocks can be seen as being in invalid (not-present) state in that cache.
- Writes invalidate all other cache copies.
  - Can have multiple simultaneous readers of block, but write invalidates them.

Administrivia

- Finish reading Chapter 4
- Exam 2 answers posted – questions?
  - Mean: 60  Median: 60  StdDev: 12
- Cache simulator project due tomorrow – questions?
- Course evaluations open, at http://www.CourseEvalUM.umd.edu
Is Example 2-state Protocol Coherent?

Assume bus transactions and memory ops atomic, and a one-level cache

- all phases of one bus transaction complete before next one starts
- processor waits for memory operation to complete before issuing next
- with one-level cache, assume invalidations applied during bus transaction

Processors only observe state of memory through reads....

Writes only observable by other processors if on bus...

- All writes go to bus! (in this example protocol, not all others)
- Writes serialized by order in which they appear on bus (bus order)
- invalidations applied to caches in bus order

- How to insert reads in this order?
- Important since processors see writes through reads, so determines whether write serialization is satisfied
- But read hits may happen independently and do not appear on bus or enter directly in bus order

Writes establish a partial order

- Doesn’t constrain ordering of reads, though shared-medium (bus) will order read misses too
  - any order among reads between writes is fine

- Writes serialized, reads and writes not interchanged, so coherent!
Outline

• Review
• Coherence
• Write Consistency
• Administrivia
• Snooping
• Building Blocks
• Snooping protocols and examples
• Coherence traffic and Performance on MP
• Directory-based protocols and examples
• Conclusion

Example Write Back Snoopy Protocol

• Invalidation protocol, write-back cache
  – Snoops every address on bus
  – If it has a dirty copy of requested block, provides that block in response to the read request and aborts the memory access
• Each memory block is in one state:
  – Clean in all caches and up-to-date in memory (Shared)
  – OR Dirty in exactly one cache (Exclusive)
  – OR Not in any caches
• Each cache block is in one state (track these):
  – Shared: block can be read
  – OR Exclusive: cache has only copy, its writeable, and dirty
  – OR Invalid: block contains no data (in uniprocessor cache too)
• Read misses: cause all caches to snoop bus
• Writes to clean blocks are treated as misses
  – write-allocate

Write-Back State Machine – CPU Events

• State machine for CPU requests for each cache block
• Non-resident blocks invalid

CPU Read hit

CPU Read
Place read miss on bus

Invalid

Shared (read/only)

CPU Write
Place Write Miss on bus

Exclusive (read/write)

CPU Write
Place Write Miss on Bus

CPU Read hit
CPU write hit

Write-Back State Machine- Bus events

State machine for bus requests for each cache block

Invalid

Shared (read/only)

Exclusive (read/write)

Write miss for this block

Write Back Block; (abort memory access)

Read miss for this block

Write Back Block; (abort any memory access)
Block-replacement

CPU Read hit

Invalid

CPU Read
Place read miss on bus

CPU read miss
Write back cache block, Place read miss on bus

CPU Write
Place Write Miss on Bus

Exclusive (read/write)

CPU read hit
CPU write hit

CPU Write Miss
Write back cache block
Place write miss on bus

Shared (read/only)

Cache Block State

CPU Read hit
CPU write hit

CPU Write Miss
Place Write Miss on Bus

Example

<table>
<thead>
<tr>
<th>Step</th>
<th>P1 State</th>
<th>Addr</th>
<th>Value</th>
<th>P2 State</th>
<th>Addr</th>
<th>Value</th>
<th>Bus Action</th>
<th>Proc. Address</th>
<th>Value</th>
<th>Memory Addr</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Write 10 to A1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>Read A1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>Read A1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>Write 20 to A1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>5</td>
<td>Write 40 to A2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Assumes A1 and A2 map to same cache block, initial cache state is invalid

Write-back State Machine-III

Write miss for this block

Shared (read/only)

Invalid

CPU Read
Place read miss on bus

CPU read miss
Write back cache block
Place read miss on bus

CPU Write
Place Write Miss on Bus

Exclusive (read/write)

CPU read hit
CPU write hit

CPU Write Miss
Write back cache block
Place write miss on bus

Example

<table>
<thead>
<tr>
<th>Step</th>
<th>P1 State</th>
<th>Addr</th>
<th>Value</th>
<th>P2 State</th>
<th>Addr</th>
<th>Value</th>
<th>Bus Action</th>
<th>Proc. Address</th>
<th>Value</th>
<th>Memory Addr</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Write 10 to A1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>Read A1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>Read A1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>Write 20 to A1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>5</td>
<td>Write 40 to A2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Assumes A1 and A2 map to same cache block
### Example

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>P1 Write 10 to A1</td>
<td>Excl.</td>
<td>A1</td>
<td>10</td>
<td>WMs</td>
<td>P1</td>
<td>A1</td>
<td>A1</td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>P1: Read A1</td>
<td>Excl.</td>
<td>A1</td>
<td>10</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>P2: Read A1</td>
<td>Sh.</td>
<td>A1</td>
<td>10</td>
<td>Wmbk</td>
<td>P1</td>
<td>A1</td>
<td>A1</td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>P2: Write 20 to A1</td>
<td>Inv.</td>
<td>Excl.</td>
<td>A1</td>
<td>20</td>
<td>WMs</td>
<td>P2</td>
<td>A1</td>
<td>A1</td>
<td>10</td>
<td></td>
</tr>
<tr>
<td>P2: Write 40 to A2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Assumes A1 and A2 map to same cache block

---

### Example

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>P1 Write 10 to A1</td>
<td>Excl.</td>
<td>A1</td>
<td>10</td>
<td>WMs</td>
<td>P1</td>
<td>A1</td>
<td>A1</td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>P1: Read A1</td>
<td>Excl.</td>
<td>A1</td>
<td>10</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>P2: Read A1</td>
<td>Sh.</td>
<td>A1</td>
<td>10</td>
<td>Wmbk</td>
<td>P1</td>
<td>A1</td>
<td>A1</td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>P2: Write 20 to A1</td>
<td>Inv.</td>
<td>Excl.</td>
<td>A1</td>
<td>20</td>
<td>WMs</td>
<td>P2</td>
<td>A1</td>
<td>A1</td>
<td>10</td>
<td></td>
</tr>
<tr>
<td>P2: Write 40 to A2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Assumes A1 and A2 map to same cache block

---

### Example

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>P1 Write 10 to A1</td>
<td>Excl.</td>
<td>A1</td>
<td>10</td>
<td>WMs</td>
<td>P1</td>
<td>A1</td>
<td>A1</td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>P1: Read A1</td>
<td>Excl.</td>
<td>A1</td>
<td>10</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>P2: Read A1</td>
<td>Sh.</td>
<td>A1</td>
<td>10</td>
<td>Wmbk</td>
<td>P1</td>
<td>A1</td>
<td>A1</td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>P2: Write 20 to A1</td>
<td>Inv.</td>
<td>Excl.</td>
<td>A1</td>
<td>20</td>
<td>WMs</td>
<td>P2</td>
<td>A1</td>
<td>A1</td>
<td>10</td>
<td></td>
</tr>
<tr>
<td>P2: Write 40 to A2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Assumes A1 and A2 map to same cache block

---

### Example

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>P1 Write 10 to A1</td>
<td>Excl.</td>
<td>A1</td>
<td>10</td>
<td>WMs</td>
<td>P1</td>
<td>A1</td>
<td>A1</td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>P1: Read A1</td>
<td>Excl.</td>
<td>A1</td>
<td>10</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>P2: Read A1</td>
<td>Sh.</td>
<td>A1</td>
<td>10</td>
<td>Wmbk</td>
<td>P1</td>
<td>A1</td>
<td>A1</td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>P2: Write 20 to A1</td>
<td>Inv.</td>
<td>Excl.</td>
<td>A1</td>
<td>20</td>
<td>WMs</td>
<td>P2</td>
<td>A1</td>
<td>A1</td>
<td>10</td>
<td></td>
</tr>
<tr>
<td>P2: Write 40 to A2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Assumes A1 and A2 map to same cache block, but A1 != A2