Problem 1

 

Consider the following sequence of instructions:

 

S1 LD F0,0(R1)

S2 ADDD F2,F0,F4

S3 MULTD F6,F8,F10

S4 MULTD F2,F2,F4

S5 ADDD F12,F6,F10

 

Assume that we have an architecture that makes use of Tomasulo’s algorithm. The architecture has 3 add units, 3 multiply units and 4 load buffers and 4 store buffers. It turns out however, that instruction S1 will never find the data it seeks as the 0(R1) mentioned in the load command is stored on a page resident in a Zip drive disk, and the removable disk has been found and chewed on by a visiting pet dog. Assume that the operating system does not recognize that there is a problem and the processor does as much work as it can do and then hangs. What would be the state of the machine at that point (i.e. after everything that can execute completes)?

Note: You do not have to show the state of the tables at each timestep, I am only looking for the state of the machine after it hangs (e.g. hours after the offending load is issued and the requested data has not appeared).

 

Instruction Status

 

Name

Busy

Function

Vj

Vk

Qj

Qk

Statement

add1

Yes

add

regs[ F4]

Load 1

S2

add2

add3

mult1

Yes

mult

regs[F4]

add1

S4

mult2

mult3

 

 

Register Status

 

Field

F0

F2

F4

F6

F8

F10

F12

Load 1

mult1

 

 

Load Buffers

 

Field

Load 1

Load 2

Load 3

Load 4

Address

O(R1)

Busy

yes

 

 

 

Store Buffers

 

Field

Store 1

Store 2

Store 3

Store 4

Qi

Busy

Address

 

 

Problem 2

 

Assume that we have three scenarios - #1 is a fully associative cache, #2 is a four way set associative cache and #3 is a direct mapped cache. The cache size is 1K bytes (2**10 bytes). The cache line size is 16 bytes. All variables are 8 bytes. Assume that the LRU policy is used to determine which block (or line) is replaced on a cache miss.

Assume that we have separate instruction and data caches and assume that the cache is empty when the problem begins. Also assume that variables i, b,c,q,r,s and t are in registers. How many data cache read misses would we take in scenarios A,B and C. State any assumptions that you make and justify your answer.

 

Cache Statistics –

Cache – 2**10 bytes, 2**4 bytes/line, 2**6 (64) lines, 2 variables/line

 

A:

 

for(i=0;i<16;i++)

{

b += q*a[i];

}

 

for(i=15;i>=0;i--)

{

c += r*a[i];

}

 

Cache – 2**10 bytes, 2**4 bytes/line, 2**6 (64) lines, 2 variables/line

In A, we read 16 consecutive elements of array a, first forward from a[0] to a[15] and then backwards from a[15] to a[0]. These elements of a fit into 8 consecutive cache lines (assuming a[0] is aligned with the beginning of a cache line). There are 8 compulsory cache misses for fully associative, four way set associative and direct mapped cache. If you assumed that a[0] mapped to the middle of a cache line, you would get 9 compulsory misses.

 

 

 

 

 

Scenario B:

 

 

for(i=0;i<16;i++)

{

b += q*a[64*i];

}

 

for(i=15;i>=0;i--)

{

c += r*a[64*i];

}

 

Cache – 2**10 bytes, 2**4 bytes/line, 2**6 (64) lines, 2 variables/line

 

 

Fully associative cache – Each variable access lies in a separate cache line. The cache is fully associative so we have 16 cache misses from the first loop and no misses from the second.

 

Four way set associative – We have 16 sets of cache lines. All references map to the same set of cache lines. Thus we end up with 16 cache misses from the first loop. At the end of the first loop, we have a[64*15], a[64*14], a[64*13], a[64*12] in cache; this gives us 4 cache hits in the second loop (note that the second loop goes from i=15 down to i=0). Consequently, we have a total of 16+12 = 28 cache misses.

 

Direct mapped cache –

Now 64 is 2**6 so if a[0] maps to the first variable of the first cache line, a[64] maps to byte (2**6)*(2**3) = 2**9 (512) of the cache. This is the first variable of the 2**5th (32nd) cache line, as each cache line has 2**16 bytes. a[128] is (2**7)*(2**3) bytes from the beginning of the array, so a[128] lies on the first variable of the first cache line. By similar reasoning, each consecutively accessed element of maps to the first variable in either the first, or the 32nd cache line. For the first of the two loops, we obtain 2 compulsory misses and 14 conflict misses. Assuming LRU replacement, first two references in the second loop hit cache (the second loop goes from i=15 down to i=0), then all subsequent references miss. Thus we have 2 hits and 30 misses.

 

 

Scenario C:

 

for(i=0;i<16;i++)

{

b += a[i] + a[i+ 2**8] + a[i+2**9] + a[i+2**10];

}

 

for(i=0;i<15;i++)

{

c += q*a[i] + r*a[i+2**8] + s*a[i+2**9] + t*a[i+2**10]

}

 

Cache – 2**10 bytes, 2**4 bytes/line, 2**6 (64) lines, 2 variables/line

 

 

Fully associative – the sixteen consecutive elements a[i], i=0,,15 occupy 8 lines, and the three sets of the sixteen consecutive elements a[i+2**8], i=0,,15; a[i+2**9] i=0,,15; a[i+2**10] i=0,,15, each occupy eight lines as well. Together, the array elements accessed occupy 32 cache lines. Consequently, we take 32 compulsory misses in the first loop but no misses in the second loop – total of 32 misses.

 

Four way set associative – all accesses in each loop iteration map onto the same cache set. Assume that a[0] is at address 0. Array references from loop iteration 0 occupies the first 8 bytes of each line in cache set 0. Array references from loop iteration 1 occupies the 2nd 8 bytes of each line in cache set 0. Array references from loop iteration 2 occupies the 1st 8 bytes of each line in cache set 1, etc. Since there are 16 loop iterations, the array references occupy all four lines of the first eight cache sets. We thus get 32 compulsory cache misses in the first loop. There are no cache misses in the second loop – total of 32 misses

 

Direct mapped –a[i+2**8] is 2**11 bytes from a[i], a[i+2**9] is 2**12 bytes from a[0], and a[i+2**10] is 2**13 bytes from a[0]. Thus for each iteration i, each of these four references map to the same cache line. Since there are 16 loop iterations, we have 64 cache misses corresponding to the first loop. In the second loop, we also have 64 cache misses – total of 128 cache misses.

 

 

Problem 3

 

Design a virtual memory system that has the following characteristics:

 

 

1) A page size of 128 K (2**17) bytes and an address word of 64 bits. Assume 8 byte page table entries.

 

page size – 2**17 bytes. Lower 17 bits addresses page. 8 = 2**3 byte table entries – 2**14 entries per page table. Thus: we have 4 levels of page tables (56 bits of address), plus the pages (17 bits of address).

 

  1. What is the maximum number of disk accesses that could occur on a page fault?
  2.  

    Worst case would be 5 disk accesses – 4 to access each level of page table, one to access page.

     

  3. Consider scenarios where the following sequences of memory references (reads) occur. Assume that a is a character array with elements of size 1 byte. Assume that memory is large enough to hold 1000 pages. Consider the following reference strings:

 

How many page faults would each reference string cause – for each of the four reference strings, assume that the system starts out with no pages in memory.

 

Solution:

 

address bits 0 to 16 – page

17 to 30 – 1st level table

31 to 44 – 2nd level table

45 to 58 – 3rd level table

59 to 64 – 4th level table

 

 

  1. a(0), a(1), a(2**10),a(2**16), a(3)
  2.  

    All these references are on the same page – 5 page faults to bring in page tables and page

     

     

  3. a(0), a(1), a(2**20), a(1+2**20)
  4.  

    These references lie on two pages: {a(0),a(1)} and {a(2**20),a(1+2**20}.

     

    3 page faults to bring in appropriate 4th, 3rd, 2nd level tables (the same 4th,3rd,2nd level tables needed for all these pages. 2 faults needed to bring in 1st level table and the page with a(0), a(1), 2 faults needed to bring in 1st level table and page needed for a(2**20), a(1+2**20).

     

     

  5. a(0), a(2**62), a(2**32), a(2**61), a(1+2**32)
  6.  

    These references lie on 4 pages: {a(0)}, {a(2**62)}, {a(2**32), a(1+2**32)}, {a(2**61)}.

     

    5 page faults needed to bring tables and page for a(0). Another 4 page faults are needed to bring in 3rd , 2nd and 1st level table plus page with a(2**62). a(2**32) shares 4th and 3rd level page table entry with a(0) but it has a different 2nd, 1st page table entry and a different page – another 3 page faults. a(2**61) shares a 4th level page table entry with a(2**62) but 3rd, 2nd ,1st level page table and page are different – 4 more page faults. Finally, a(1+2**32) does not cause a page fault as a(1+2**32) is on the same page as a(2**32).

  7. a(0),a(2**32), a(81+2**12), a(0)

 

 

These references lie on 2 pages: {a(0),a(81+2**12)}, {a(2**32)}.

 

5 page faults needed to bring in tables and page for a(0). Different 1st level table and page for a(2**32) – 2 additional faults. a(81+2**12) and a(0) do not cause any additional page faults.