Solutions - Homework - due April 22 1997

 

Problem 1

 

Assume that we have three scenarios - #1 is a fully associative cache, #2 is a two way set associative cache and #3 is a direct mapped cache. The cache size is 256 bytes. The cache line size is 8 bytes. All variables are 4 bytes. Assume that we have separate instruction and data caches. How many data cache read misses would we see from scenarios A) and B). State any assumptions that you make and justify your answer.

 

A:

 

1.for(i=0;i<16;i++)

2.{

3.b += q*a[i];

4.}

 

5.for(i=0;i<16;i++)

6.{

7.c += r*a[i];

8.}

 

 

B:

 

1.for(i=0;i<16;i++)

2.{

3.b += q*a[64*i];

4.}

 

5.for(i=0;i<16;i++)

6.{

7.c += r*a[64*i];

8.}

 

Hennessy and Patterson - Problem 5.1,

parts A,B

 

Solution

 

Assumption -- assign a[1024] to memory locations byte 0 through 4096. Assign b to byte 4096 – 4099, c to byte 4100 to 4103, q to byte 4104-4107, r to byte 4108-4111, i to byte 4112-4115.

 

There is a 256 byte cache – cache line 8 bytes; thus there are 32 cache lines. Assume that all data in a cache line is fetched when a cache line is accessed. Assume line replacement is LRU

 

Fully associative cache –

 

A: Loop A accesses the first 16 elements of a, these first 16 elements are stored in 8 cache lines. b and c are stored on a single cache line as are q and r . Variable i is stored in its own cache line. We thus need 11 cache lines to store all data. There will be 11 compulsory misses in loop A (each cache line is fetched a single time) – no further misses will occur as there will be no capacity misses (not enough data for capacity misses) and no conflict misses.

 

Direct mapped cache – Since no cache line has more than one variable mapping to the line, there will be no conflict misses so there are also only 11 cache misses in this case.

 

Two way set associative – also only 11 misses. There are only 16 sets (each set has 2 cache lines).

Consequently, each set has only one variable that maps to the set. Consequently no matter what the replacement policy, there will be no conflict misses.

 

B: These loops access elements a[0], a[64], a[128], … , a[1024]. These elements fall on 16 distinct cache lines. Consequently, in this loop, we access 19 distinct cache lines.

 

Fully associative – since there are 32 cache lines, there will be no conflict misses. However, we have 19 cache misses here as we only get a single element per cache miss.

 

Direct mapped – a[0] is located at byte 0 which is located at cache line 0, a[64] is located at byte 256, which is in cache line 0, a[64*i] is located at byte 256*i which is also at cache line 0. Consequently, in case B, the loop beginning with statement 5 misses on every access of a. We thus pick up an additional 16 conflict misses. Since b,q,c,r and i do not map to cache line 0, we get a single cache miss for these variables. (Total 35 misses)

 

Two way set associative – In this case, we have two lines associated with set 0. This does not help – we still get 35 misses as a[0], a[64], a[128], …., all map into the same set.

 

 

Solution to exercise 5.1 a,b

 

A program that is bad for cache B would access the cache in a way that would force the cache to frequently write back cache blocks. A program that is good for cache B would repeatedly read and write data in one or more cache lines and never miss.

 

Each line is 32 bytes (total size of cache – 8192 bytes). Assume that each element of array a is 4 bytes, that the address of a[0] is 0 and that the variable i is stored in register.

 

 

Program that is bad for cache B

 

 

For(i=0;i<100;i++)

{

a[2048*i] = i;

}

 

This gives cache B to write back every loop iteration as all writes map to the same cache block. The cost is equivalent to 1000 read cache hits

 

Cache A writes through 100 times – this costs the equivalent of 500 cache read misses

 

Reads:

 

for(i=0;i<100;i++)

{

b = a[0];

c = a[2048];

}

 

Cache A will hit cache every time. Machine B will miss every time.

 

Another unpleasant thing to do to Machine B

 

for(i=0;i<100;i++)

{

a[0] = i;

c = a[2048];

}

 

Here we have cache misses every time (in machine B) and we have to

write back a[2048] every loop iteration.

 

Machine A will hit cache every time, although it will have to write

through.

 

Here is a loop that is good for cache B

 

for(i=0;i<100;i++) {

a[0] += i;}

 

All iterations write to the same block so if the machine had a separate instruction and data cache, there would never be a write miss (after the first iteration). The above loop could actually access many elements of a, as long as there is never a cache miss. Cache A still has to write through every time.