Memory Management in Windows - Part 2: Address Translation
Note: We’re going to have to break this into three parts, not two, because I wrote so much on the address translation piece I’ve hit the character limit for HMN posts. And I didn't even get to talk about TLBs or anything!
While virtual memory address spaces can be as large as the implementation supports, 128 terabytes on modern Windows [https://learn.microsoft.com/en-us/windows/win32/memory/memory-limits-for-windows-releases#memory-and-address-space-limits], physical memory is limited by how much we’ve put in the computer. Windows Server 2022 supports up to 48 terabytes of RAM [https://www.microsoft.com/en-us/windows-server] (oh the things I would do for access to a Windows server like that!), but most of us will likely be working with much less. So, how is 128 terabytes of virtual address space per process (and that’s just the user mode portion, there’s another 128 terabytes for kernel mode) squeezed into a comparatively minuscule amount of physical memory?
Let’s start by getting a feel for what the journey from virtual to physical memory looks like. This process is called translation and it is so performance-critical that special hardware has been invented to assist in carrying it out [https://en.wikipedia.org/wiki/Memory_management_unit]. A page of virtual memory’s location in physical memory is noted within a page table, in a page table entry or PTE. These page tables and PTEs must be in a specific format to work with the available hardware. Page tables themselves can take up a fair chunk of physical memory, and this cost can add up quickly. Since a page table must always be resident in physical memory for the translation to work (it would be hard to find where a page is in physical memory if the thing that tells you where the page is isn’t there!), we would end up taking up a large amount of physical memory just for the page tables themselves. To get around this, many systems (Windows included) create page tables of page tables. That way, only the top-level page table need stay in memory.
The top-level page table in Windows is the Page Directory Pointer Table (PDPT). The PDPT is always guaranteed to be resident in physical memory, and its physical address is stored within the process’ data structures and loaded into a processor register whenever one of the process’ threads is executing there. From the PDPT, one can find a PDPE (E standing for Entry), which will lead to the physical address of a Page Directory. From the Page Directory, one can find a PDE pointing to a Page Table. From the Page Table, one can find a PTE and finally the physical page of memory where the information resides. Once we have found the physical page, we can, at long last, get the content at the physical address using the offset of that address from the start of the process’ virtual address space.
But how do we account for this massive four levels of indirection efficiently? How do we go from table spanning 128 TB of virtual address space to a single address in physical memory? The answer is absolutely brilliant: We need nothing more than the virtual memory address itself! By assigning specific portions of the address to specific indices in each stage, we can go down this entire chain with just one number. I can’t help but mention that this was (in my opinion) the single coolest thing I learned in the entire jam. I’ve attached an image from Windows Internals showing this breakdown for x86 PAE virtual addresses (hopefully this won’t cause any DMCA notices to be sent to HMN). There is also a fantastic overview of virtual memory, which includes discussing this translation in particular (in a more general sense) linked in a previous post in this project.
So, we can now translate a virtual address to a physical address, assuming the virtual address indeed resides in physical memory. But we already know it may very well not. It might be a file on disk, or it might be a page that has been sent out to the page file. When this happens, it is called a page fault. When a page fault occurs, the memory manager is summoned to deal with the problem. These faults can be as easy as giving the process another page of memory or expanding a thread stack, to so catastrophic it will crash the system. I’ve attached a table of the possible faults and their consequences to this post. Since these faults can occur for many reasons, the “page fault” counts in tools like Process Explorer can get quite high (Discord was sitting over one million after around 40 minutes of uptime on a system with ample physical memory). This is not necessarily something to be alarmed about, and does not indicate resource starvation. Most of these faults are harmless and occurring by design.
Now that we know how and when this translation is carried out, let’s look at how Windows manages physical memory.