Virtual Memory

Introduction

A cache stores a subset of the addresss space of RAM. An address space is the set of valid addresses. Thus, for each address in cache, there is a corresponding address in RAM. This subset of addresses (and corresponding copy of data) changes over time, based on the behavior of your program.

Cache is used to keep the most commonly used sections of RAM in the cache, where it can be accessed quickly. This is necessary because CPU speeds increase much faster than speed of memory access. If we could access RAM at 3 GHz, there wouldn't be any need for cache, because RAM could keep up. Because it can't keep up, we use cache.

What if we wanted more RAM than we had available. For example, we might have 1 M of RAM, what if we wanted 10 M? How could we manage?

One way to extend the amount of memory accessible by a program is to use disk. Thus, we can use 10 Megs of disk space. At any time, only 1 Meg resides in RAM.

In effect, RAM acts like cache for disk.

This idea of extending memory is called virtual memory. It's called "virtual" only because it's not RAM. It doesn't mean it's fake.

The real problem with disk is that it's really, really slow to access. If registers can be accessed in 1 nanosecond, and cache in 5 ns and RAM in about 100 ns, then disk is accessed in fractions of seconds. It can be a million times slower to access disk than a register.

The advantage of disk is it's easy to get lots of disk space for a small cost.

Still, becaues disk is so slow to access, we want to avoid accessing disk unnecessarily.

Uses of Virtual Memory

Virtual memory is an old concept. Before computers had cache, they had virtual memory. For a long time, virtual memory only appeared on mainframes. Personal computers in the 1980s did not use virtual memory. In fact, many good ideas that were in common use in the UNIX operating systems didn't appear until the mid 1990s in personal computer operating systems (pre-emptive multitasking and virtual memory).

Initially, virtual memory meant the idea of using disk to extend RAM. Programs wouldn't have to care whether the memory was "real" memory (i.e., RAM) or disk. The operating system and hardware would figure that out.

Later on, virtual memory was used as a means of memory protection. Every program uses a range of addressed called the address space.

The assumption of operating systems developers is that any user program can not be trusted. User programs will try to destroy themselves, other user programs, and the operating system itself. That seems like such a negative view, however, it's how operating systems are designed. It's not necessary that programs have to be deliberately malicious. Programs can be accidentally malicious (modify the data of a pointer pointing to garbage memory).

Virtual memory can help there too. It can help prevent programs from interfering with other programs.

Occasionally, you want programs to cooperate, and share memory. Virtual memory can also help in that respect.

How Virtual Memory Works

When a computer is running, many programs are simulataneously sharing the CPU. Each running program, plus the data structures needed to manage it, is called a process.

Each process is allocated an address space. This is a set of valid addresses that can be used. This address space can be changed dynamically. For example, the program might request additional memory (from dynamic memory allocation) from the operating system.

If a process tries to access an address that is not part of its address space, an error occurs, and the operating system takes over, usually killing the process (core dumps, etc).

How does virtual memory play a role? As you run a program, it generates addresses. Addresses are generated (for RISC machines) in one of three ways:

A load instruction
A store instruction
Fetching an instruction

Load/store create data addresses, while fetching an instruction creates instruction addresses. Of course, RAM doesn't distinguish between the two kinds of addresses. It just sees it as an address.

Each address generated by a program is considered virtual. It must be translated to a real physical address. Thus, address tranlation is occuring all the time. As you might imagine, this must be handled in hardware, if it's to be done efficiently.

You might think translating each address from virtual to physical is a crazy idea, because of how slow it is. However, you get memory protection from address translation, so it's worth the hardware needed to get memory protection.

Paging

In a cache, we fetched quantities called data blocks or cache lines. Those are typically somewhere between, say, 4 and 64 bytes.

There is a corresponding terminology in virtual memory to a cache line. It's called a page.

A page is a sequence of N bytes where N is a power of 2.

These days, page sizes are at least 4K in size and maybe as large as 64 K or more.
Let's assume that we have 1M of RAM. RAM is also called physical memory. We can subdivide the RAM into 4K pages. Thus 1M / 4K = 256 pages. Thus, our RAM has 256 physical pages, weach holding 4K.
Let's assume we have 10 M of disk. Thus, we have 2560 disk pages.
In principle, each program may have up to 4 G of address space. Thus, it can, in principle, access 2²⁰ virtual pages. In reality, many of those pages are considered invalid pages.
Page Tables
How is an address translated from virtual to physical? First, like the cache, we split up a 32 bit virtual address into a virtual page (which is like a tag) and a page offset.

If this looks a lot like a fully-associative cache, but whose offset is much much larger, it's because that's basically what it is.
We must convert the virtual page number to a physical page number. In our example, the virtual page consists of 20 bits. A page table is a data structure which consists of 2²⁰ page table entries (PTEs). Think of the page table as an array of page table entries, indexed by the virtual page number.
The page table's index starts at 0, and ends at 2²⁰ - 1.
Here's how it looks:

Suppose your program generates a virtual address. You'd extract bits B_31-12 to get the virtual page number. Use that as an index into the above page table to access the page table entry (PTE).
Each PTE consists of a valid bit and a 20 bit physical page (it's 20 bits, because we assume we have 1M of RAM, and 1M of RAM requires 20 bits to access each byte). If the valid bit is 1, then the virtual page is in RAM, and you can get the physical page from the PTE. This is called a page hit, and is basically the same as a cache hit.
If the valid bit is 0, the page is not in RAM, and the 20 bit physical page is meaningless. This means, we must get the disk page corresponding to the virtual page from disk and place it into a page in RAM. This is called a page fault.
Because disk access is slow, slow, slow, we want to minimize the number of page faults.
In general, this is done by making RAM fully associative. That is, any disk page can go into any RAM page (disk, RAM, and virtual pages all have the same size).
In practice, some pages in RAM are reserved for the operating system to make the OS run efficiently.
Translation
Suppose your program generated the following virtual address F0F0F0F0_hex (which is 1111 0000 1111 0000 1111 0000 1111 0000 _two). How would you translate this to a physical address? First, you would split the address into a virtual page, and a page offset (see below).
Then, you'd see if the virtual page had a corresponding physical page in RAM using the page table. If the valid bit of the PTE is 1, then you'd translate the virtual page to a physical page, and append the page offset. That would give you a physical address in RAM.

Huge Page Tables
Page tables can be very large. If every virtual page was valid, our page table would be 2²⁰ X 21 bits. This is about 3 Megs just for one program's page table. If there are many programs, there are many tables, each occupying a lot of memory.
What's worse, the page tables we've been talking about are incomplete. If we have a page fault, we need to find the page on disk. Where is it located? That information is kept in another page table, which is indexed by the virtual page (same as the page table we talked about), and tells you where on disk to find it. Then, we have to copy that page to RAM, and update the first page table. Thus, we need two page table tables!
These page tables are basically just data. Thus, they occupy memory as any data occupies memory. When we switch from one process to another, we need to load its page table in RAM for easy access. It's useful to keep it located in certain parts of RAM for just such a purpose. If RAM is suitable large, we can have several processes' page tables in RAM at the same time.
A page table register can hold the physical address of the page table that's currently active to get quick access. Still, these are large, and we may want to find ways to speed things up.
Inverted Page Tables
There are many schemes to reduce the size of a page table. One way is to use a hierarchy. Thus, we might have two layers of pages. Bits B_31-22 might tell you the first layer, while B_21-13 might tell you the second layer.
Another idea is to use a kind of closed hash table. The hash table's size is based on the number of physical pages. The number of physical pages is usually a lot smaller than the number of all virtual pages put together.
A hash function takes a virtual page number as input, and produces an index into the hash table as the result. Each entry of the hash table consists of a virtual page number and a physical page number. You check to see if the virtual page number matched, and if so, then you use the physical page.
If it missed, then you must resolve the collision based on the hash table. In practice, you may need the number of entries of the hash table to be a few times larger than the number of physical pages, to avoid excessive collisions.
An inverted page table takes longer to access because you may have collisions, but it takes up a lot less memory. It helps with page hits. However, if you have a page fault, you still need a page table that maps virtual pages to disk pages, and that will be large.
Translation Lookaside Buffer (TLB)
What's the cost of address translation? For each virtual address, we must access the page table to find the PTE corresponding to the virtual page. We look up the physical page from the PTE, and construct a physical address. Then, we access RAM at the physical address. That's two memory acccess: one to access the PTE, one more to access the data in RAM. Cache helps us cut down the amount of time to access memory, but that's only if we have cache hits.
The idea of a TLB is to create a special cache for translations. Here's one example of a TLB.

Each row in the TLB is like one slot of a cache. Assume we have 64 rows. When you have a virtual address, you can split it into a virtual page and an offset.
In parallel, compare the virtual page to all of the entries of the TLB (say, 64). There should be, at most, one match. Just like a fully associative cache, you want to check if the TLB entry is valid.
If a TLB hit occurs, replace the virtual page with a physical page to create a physical address.
If there's a TLB miss, then it's still possible that the virtual page resides in RAM. You must now look up the PTE (page table entry) to see if this is the case. If the PTE says the virtual page is in RAM, then you can update the TLB, so that it has a correct virtual to physical page translation.
The TLB is designed to only store a limited subset of virtual to physical page translation. It is really just a cache for the page table, storing only the most frequently used translations.
The TLB can be kept small enough that it can be fully associative. However, some CPU designers make larger TLBs that are direct mapped or set associative.
Memory Protection
How does virtual addresses give us memory protection? Suppose you work at a post office, which assigns post boxes to individuals. You have a person who comes in and says they want three post office boxes: 100, 101, and 102.
They insist on using those numbers. Another customer comes in, and insists on using those numbers too. How can you accomodate both customers?
Basically, you cheat. You tell the first customer you have boxes 100, 101, and 102, but you assign him boxes 200, 201, and 202. Similarly, you tell the second customer that you also have boxes 100, 101, and 102, but you assign her boxes 320, 321, and 322.
Whenever customer 1 wants the mail in box 100, you translate it to box 200. Whenever customer 2 wants mail in box 100, you translate it to box 320. Thus, your two customers get to use the box numbers they want, and through the magic of translation, they two customers avoid using each other's boxes.
If the post office wanted to reserve its own boxes for its own use, it could reserve boxes 1 through 100 to itself, and never assign those boxes, directly or indirectly to a customer. Even if a customer wants box 50, they can be assigned box 150, safely outside the range of boxes reserved for the post office.
This same analogy applies to real programs. Each program can assume it uses the same set of 32 bit virtual addresses. We just make sure that those virtual pages do not map to the same disk page, nor to the same physical page.
Who's job is it to assign the pages? It's the operating system. When a program starts up, it will want a certain range of addresses. The operating system creates a page table for the program, making sure the disk pages it uses do not conflict with the disk pages of other programs.
Invalid Pages
Sometimes you don't really want a program to access all possible 32 bit addresses. This helps reduce the total size of the page table. One way to prevent a user program from accessing invalid pages is make certain virtual pages entries invalid. Thus, an attempt to translate virtual to physical page will fail, and even looking up the virtual page on disk fails.
Page Replacement Schemes
Like cache, you can have page replacement schemes based on FIFO, LRU, LFU, etc. In general, page replacement schemes can be more sophisticated because getting a page off disk is really slow, so you can afford to take more time to make a better choice.
Dirty Bit
In reality, caches usually don't have dirty bits. This means that you must always write back if a cache line is evicted.
However, because disk access is slower, it makes sense to use dirty bits for pages. Thus, if a page hasn't been modified (maybe because it's read only), there's no reason to copy it back to disk. Just toss it out.
Cache
Virtual memory works with caching. Basically, once the virtual address is translated to a physical address, then the physical address is passed to the cache, which checks to see if there is a cache hit.
The Oblivious Programmer
As with cache, assembly lanugage programmers don't have to worry about virtual memory. They just see "memory". They don't do anything in particular whether there's virtual memory or not. Virtual memory is handled partly by hardware (translation mechanism) and partly by the operating system (sets up page table, handles page faults, etc).
This is good because at one point, programmers had to worry very much about whether a chunk of memory resided on the disk or in RAM. Programmers had to spend a great deal of effort managing this, and it distracted them from coding. With virtual memory, the management of disk as an extension of RAM is handled automatically.
Shared Memory
You can share memory between two processes by mapping the virtual page to the same disk page. When that disk page is resident in physical memory, then both processes can access the same location.
There may be issues of synchronization to handle, but that's a topic that's best left to a course in operating systems. Suffice it to say that we do have a way to map virtual pages to the same disk page to allow for sharing.
Sharing is available when you want two processes to collaborate in a somewhat safe manner. Both processes, for the most part, have their own memory. They only share a small region between the two of them.
Terminology
Some definitions before we summarize:

A physical page is 2^k consecutive bytes in memory. The page is "super word-aligned". That is, it's first byte must be at an address divisible by 2^k.
A disk page is 2^k consecutive bytes in disk (also, super word-aligned) Disks can be partitioned up non-contiguously, so it doesn't have to be truly consecutive, as long as the CPU "thinks" it is. Each physical page should have a corresponding disk page. However, there are usually far more disk pages than physical pages.
A virtual page is 2^k consecutive bytes, which is super word-aligned. This is the page the program thinks it is dealing with. This page can either be in RAM (physical memory) and/or disk.
The physical page number, disk page number, and virtual page number can all be different. The virtual page number is the page used by the program. The physical page number is the page in RAM, if it is currently in RAM. The disk page number is the page in disk. Recall the post office analogy where the person has a virtual post office box number, and the postal workers know the mapping for each person from virtual box number to real box number.
Summary
Virtual memory serves two purposes. First, it allows us to extend the use of physical memory by using disk. Second, it allows us to have memory protection, because each virtual address is translated to a physical address.
The two are somewhat orthogonal (independent) of each other. For example, we could use the disk as an extension of the address space accessible to the programmer, without providing memory protection. Thus, a programmer might be able to access all disk pages, if the operating system allowed it when setting up page tables.
It's also possible to have memory protection without any disks. Provided each process only needs a small number of pages, we could allow all of those virtual pages to reside in RAM. For example, suppose each process has only 2 virtual pages. Then, since (in our example) RAM stores 256 pages, we could allow 128 processes to have virtual page 0 and 1, and they would not interfere with each other in virtual memory.
Web Accessibility