Cache is used to keep the most commonly used sections of RAM in the cache, where it can be accessed quickly. This is necessary because CPU speeds increase much faster than speed of memory access. If we could access RAM at 3 GHz, there wouldn't be any need for cache, because RAM could keep up. Because it can't keep up, we use cache.
What if we wanted more RAM than we had available. For example, we might have 1 M of RAM, what if we wanted 10 M? How could we manage?
One way to extend the amount of memory accessible by a program is to use disk. Thus, we can use 10 Megs of disk space. At any time, only 1 Meg resides in RAM.
In effect, RAM acts like cache for disk.
This idea of extending memory is called virtual memory. It's called "virtual" only because it's not RAM. It doesn't mean it's fake.
The real problem with disk is that it's really, really slow to access. If registers can be accessed in 1 nanosecond, and cache in 5 ns and RAM in about 100 ns, then disk is accessed in fractions of seconds. It can be a million times slower to access disk than a register.
The advantage of disk is it's easy to get lots of disk space for a small cost.
Still, becaues disk is so slow to access, we want to avoid accessing disk unnecessarily.
Initially, virtual memory meant the idea of using disk to extend RAM. Programs wouldn't have to care whether the memory was "real" memory (i.e., RAM) or disk. The operating system and hardware would figure that out.
Later on, virtual memory was used as a means of memory protection. Every program uses a range of addressed called the address space.
The assumption of operating systems developers is that any user program can not be trusted. User programs will try to destroy themselves, other user programs, and the operating system itself. That seems like such a negative view, however, it's how operating systems are designed. It's not necessary that programs have to be deliberately malicious. Programs can be accidentally malicious (modify the data of a pointer pointing to garbage memory).
Virtual memory can help there too. It can help prevent programs from interfering with other programs.
Occasionally, you want programs to cooperate, and share memory. Virtual memory can also help in that respect.
Each process is allocated an address space. This is a set of valid addresses that can be used. This address space can be changed dynamically. For example, the program might request additional memory (from dynamic memory allocation) from the operating system.
If a process tries to access an address that is not part of its address space, an error occurs, and the operating system takes over, usually killing the process (core dumps, etc).
How does virtual memory play a role? As you run a program, it generates addresses. Addresses are generated (for RISC machines) in one of three ways:
Each address generated by a program is considered virtual. It must be translated to a real physical address. Thus, address tranlation is occuring all the time. As you might imagine, this must be handled in hardware, if it's to be done efficiently.
You might think translating each address from virtual to physical is a crazy idea, because of how slow it is. However, you get memory protection from address translation, so it's worth the hardware needed to get memory protection.
There is a corresponding terminology in virtual memory to a cache line. It's called a page.
A page is a sequence of N bytes where N is a power of 2.
These days, page sizes are at least 4K in size and maybe as large as 64 K or more.
Let's assume that we have 1M of RAM. RAM is also called physical memory. We can subdivide the RAM into 4K pages. Thus 1M / 4K = 256 pages. Thus, our RAM has 256 physical pages, weach holding 4K.
Let's assume we have 10 M of disk. Thus, we have 2560 disk pages.
In principle, each program may have up to 4 G of address space. Thus, it can, in principle, access 220 virtual pages. In reality, many of those pages are considered invalid pages.
If this looks a lot like a fully-associative cache, but whose offset is much much larger, it's because that's basically what it is.
We must convert the virtual page number to a physical page number. In our example, the virtual page consists of 20 bits. A page table is a data structure which consists of 220 page table entries (PTEs). Think of the page table as an array of page table entries, indexed by the virtual page number.
The page table's index starts at 0, and ends at 220 - 1.
Here's how it looks:
Suppose your program generates a virtual address. You'd extract bits B31-12 to get the virtual page number. Use that as an index into the above page table to access the page table entry (PTE).
Each PTE consists of a valid bit and a 20 bit physical page (it's 20 bits, because we assume we have 1M of RAM, and 1M of RAM requires 20 bits to access each byte). If the valid bit is 1, then the virtual page is in RAM, and you can get the physical page from the PTE. This is called a page hit, and is basically the same as a cache hit.
If the valid bit is 0, the page is not in RAM, and the 20 bit physical page is meaningless. This means, we must get the disk page corresponding to the virtual page from disk and place it into a page in RAM. This is called a page fault.
Because disk access is slow, slow, slow, we want to minimize the number of page faults.
In general, this is done by making RAM fully associative. That is, any disk page can go into any RAM page (disk, RAM, and virtual pages all have the same size).
In practice, some pages in RAM are reserved for the operating system to make the OS run efficiently.
Then, you'd see if the virtual page had a corresponding physical page in RAM using the page table. If the valid bit of the PTE is 1, then you'd translate the virtual page to a physical page, and append the page offset. That would give you a physical address in RAM.
What's worse, the page tables we've been talking about are incomplete. If we have a page fault, we need to find the page on disk. Where is it located? That information is kept in another page table, which is indexed by the virtual page (same as the page table we talked about), and tells you where on disk to find it. Then, we have to copy that page to RAM, and update the first page table. Thus, we need two page table tables!
These page tables are basically just data. Thus, they occupy memory as any data occupies memory. When we switch from one process to another, we need to load its page table in RAM for easy access. It's useful to keep it located in certain parts of RAM for just such a purpose. If RAM is suitable large, we can have several processes' page tables in RAM at the same time.
A page table register can hold the physical address of the page table that's currently active to get quick access. Still, these are large, and we may want to find ways to speed things up.
Another idea is to use a kind of closed hash table. The hash table's size is based on the number of physical pages. The number of physical pages is usually a lot smaller than the number of all virtual pages put together.
A hash function takes a virtual page number as input, and produces an index into the hash table as the result. Each entry of the hash table consists of a virtual page number and a physical page number. You check to see if the virtual page number matched, and if so, then you use the physical page.
If it missed, then you must resolve the collision based on the hash table. In practice, you may need the number of entries of the hash table to be a few times larger than the number of physical pages, to avoid excessive collisions.
An inverted page table takes longer to access because you may have collisions, but it takes up a lot less memory. It helps with page hits. However, if you have a page fault, you still need a page table that maps virtual pages to disk pages, and that will be large.
The idea of a TLB is to create a special cache for translations. Here's one example of a TLB.
Each row in the TLB is like one slot of a cache. Assume we have 64 rows. When you have a virtual address, you can split it into a virtual page and an offset.
In parallel, compare the virtual page to all of the entries of the TLB (say, 64). There should be, at most, one match. Just like a fully associative cache, you want to check if the TLB entry is valid.
If a TLB hit occurs, replace the virtual page with a physical page to create a physical address.
If there's a TLB miss, then it's still possible that the virtual page resides in RAM. You must now look up the PTE (page table entry) to see if this is the case. If the PTE says the virtual page is in RAM, then you can update the TLB, so that it has a correct virtual to physical page translation.
The TLB is designed to only store a limited subset of virtual to physical page translation. It is really just a cache for the page table, storing only the most frequently used translations.
The TLB can be kept small enough that it can be fully associative. However, some CPU designers make larger TLBs that are direct mapped or set associative.
They insist on using those numbers. Another customer comes in, and insists on using those numbers too. How can you accomodate both customers?
Basically, you cheat. You tell the first customer you have boxes 100, 101, and 102, but you assign him boxes 200, 201, and 202. Similarly, you tell the second customer that you also have boxes 100, 101, and 102, but you assign her boxes 320, 321, and 322.
Whenever customer 1 wants the mail in box 100, you translate it to box 200. Whenever customer 2 wants mail in box 100, you translate it to box 320. Thus, your two customers get to use the box numbers they want, and through the magic of translation, they two customers avoid using each other's boxes.
If the post office wanted to reserve its own boxes for its own use, it could reserve boxes 1 through 100 to itself, and never assign those boxes, directly or indirectly to a customer. Even if a customer wants box 50, they can be assigned box 150, safely outside the range of boxes reserved for the post office.
This same analogy applies to real programs. Each program can assume it uses the same set of 32 bit virtual addresses. We just make sure that those virtual pages do not map to the same disk page, nor to the same physical page.
Who's job is it to assign the pages? It's the operating system. When a program starts up, it will want a certain range of addresses. The operating system creates a page table for the program, making sure the disk pages it uses do not conflict with the disk pages of other programs.
However, because disk access is slower, it makes sense to use dirty bits for pages. Thus, if a page hasn't been modified (maybe because it's read only), there's no reason to copy it back to disk. Just toss it out.
This is good because at one point, programmers had to worry very much about whether a chunk of memory resided on the disk or in RAM. Programmers had to spend a great deal of effort managing this, and it distracted them from coding. With virtual memory, the management of disk as an extension of RAM is handled automatically.
There may be issues of synchronization to handle, but that's a topic that's best left to a course in operating systems. Suffice it to say that we do have a way to map virtual pages to the same disk page to allow for sharing.
Sharing is available when you want two processes to collaborate in a somewhat safe manner. Both processes, for the most part, have their own memory. They only share a small region between the two of them.
The two are somewhat orthogonal (independent) of each other. For example, we could use the disk as an extension of the address space accessible to the programmer, without providing memory protection. Thus, a programmer might be able to access all disk pages, if the operating system allowed it when setting up page tables.
It's also possible to have memory protection without any disks. Provided each process only needs a small number of pages, we could allow all of those virtual pages to reside in RAM. For example, suppose each process has only 2 virtual pages. Then, since (in our example) RAM stores 256 pages, we could allow 128 processes to have virtual page 0 and 1, and they would not interfere with each other in virtual memory.