Handmade Network»AnnLikesWindows

Recent Activity

Memory Management in Windows - Part 4: Conclusion

Well, that's it, 15,000 characters worth of me trying to summarize how memory management works in Windows! There was a ton that I didn't cover. Copy-on-write, hardware permissions on physical pages, translation buffers and caches, and much more. My hope with the below is that it has piqued your interest and made you want to learn more about this fascinating OS. I know there are some very knowledgeable folks around here, so if there's something I've got wrong or misunderstood, please don't hesitate to let me know.

Thank you so much to Ben and Asaf for putting the jam together and helping with some technical difficulties I had when making these posts. And thank you to the entire HMN community, for being such welcoming and encouraging folks. If anyone finds this useful, please let me know here or on Discord. Cheers!

Memory Management in Windows – Part 3: Physical Memory

Just like virtual memory, processes also have a limited amount of physical memory, referred to as the process’ Working Set. There are also System and Session Working Sets, which contain subsets of pageable code and data scoped to those spaces. While it is possible for a process to specify a minimum and maximum working set size, these limits are almost always ignored in practice. The default minimum and maximum per process is 50 and 345 pages, respectively. You can self-impose a hard cap on maximum working set if you’d like to do that for some reason. Without these self-imposed restrictions, working sets will grow or shrink beyond these limits as demand requires and resources allow.

When memory is getting low, working sets are trimmed by the working set manager, taking into account a number of factors including minimum working set sizes and the last time pages were accessed. Page faults under these conditions that require another page from the working set are handled by replacing pages on the working set and paging out old ones, rather than allocating new pages. It is possible to manually trigger the trimming of a process’ working set at anytime by setting the maximum working set size while the program is running. When memory is plentiful, the working set manager instead spends its time calculating how many pages could be removed from working sets should the need ever arise.

On a system-wide scale, the total memory available for allocation is referred to as the “system commit limit.” This comprises the sum of all physical memory available to the system plus the size of any page files. It is possible to run Windows with no page file at all (though this is almost universally discouraged), in which case the commit limit is simply the amount of physical memory available to Windows. Note that it is likely that this number does not coincide exactly with the amount of physical memory installed, as certain hardware reserves memory for itself independent of the operating system.

Any memory allocated against the system commit limit is considered the “system commit charge” and represents everything that must be kept either in RAM, or in the page file (so basically, anything that isn’t a file on disk other than the page file). When the system commit limit is reached, the system will attempt to increase the size of the page file. If this does not succeed (or, seemingly, if it can’t be carried out fast enough), memory allocations will fail. Each process also has a process page file quota, which tracks its contributions to the system commit charge. It’s worth noting that the commit charge and process page file quotas reflect the maximum theoretical, rather than actual, usage. Windows will not allocate any memory that it could not actually provide if necessary, even if many of those allocations have not and may never take place.

As you can imagine, pages added to a process’ working set are not chosen at random (well, they kind of are, but ASLR is another topic). Windows keeps track of every physical page of memory in the Page Frame Number database. These pages can be in one of nine states: Free, Zeroed, Modified, Modified No-Write, Standby, Transition, Active, Rom, or Bad. Active, or Valid, pages are either part of a working set or in use by some other means and typically have a valid PTE pointing to them. Transition pages are currently undergoing I/O, not part of a working set, and not on a page list. Modified no-write is a special case of the modified state where the page won’t be written to disk. This state is only used by file system drivers in specific scenarios. The other six states’ pages are each tracked in their own linked list for quick access by the memory manager.

Every page in the system starts out on the free page list, and returns there when it is no longer in use. These free pages are zeroed by the zero page thread and placed on the zero page list. Memory is (typically) pulled from the zero page list into a working set. When memory is trimmed from a working set, it either goes onto the modified or standby list. Modified pages have been changed in some way since they were last written to disk, and therefore must have their contents saved by the modified page writer before becoming standby pages. Standby pages can be reused immediately by whatever was using them previously, because their content has not changed since the last time it was written to disk. Rom pages are read-only, and bad pages have failed a consistency check should not be used. Most new allocations happen from the zero page list. Kernel mode processes are permitted to pull directly from the free page list in some cases, so long as the memory’s content has been overwritten before it makes it to user mode.

Memory Management in Windows - Part 2: Address Translation

Note: We’re going to have to break this into three parts, not two, because I wrote so much on the address translation piece I’ve hit the character limit for HMN posts. And I didn't even get to talk about TLBs or anything!

While virtual memory address spaces can be as large as the implementation supports, 128 terabytes on modern Windows [https://learn.microsoft.com/en-us/windows/win32/memory/memory-limits-for-windows-releases#memory-and-address-space-limits], physical memory is limited by how much we’ve put in the computer. Windows Server 2022 supports up to 48 terabytes of RAM [https://www.microsoft.com/en-us/windows-server] (oh the things I would do for access to a Windows server like that!), but most of us will likely be working with much less. So, how is 128 terabytes of virtual address space per process (and that’s just the user mode portion, there’s another 128 terabytes for kernel mode) squeezed into a comparatively minuscule amount of physical memory?

Let’s start by getting a feel for what the journey from virtual to physical memory looks like. This process is called translation and it is so performance-critical that special hardware has been invented to assist in carrying it out [https://en.wikipedia.org/wiki/Memory_management_unit]. A page of virtual memory’s location in physical memory is noted within a page table, in a page table entry or PTE. These page tables and PTEs must be in a specific format to work with the available hardware. Page tables themselves can take up a fair chunk of physical memory, and this cost can add up quickly. Since a page table must always be resident in physical memory for the translation to work (it would be hard to find where a page is in physical memory if the thing that tells you where the page is isn’t there!), we would end up taking up a large amount of physical memory just for the page tables themselves. To get around this, many systems (Windows included) create page tables of page tables. That way, only the top-level page table need stay in memory.

The top-level page table in Windows is the Page Directory Pointer Table (PDPT). The PDPT is always guaranteed to be resident in physical memory, and its physical address is stored within the process’ data structures and loaded into a processor register whenever one of the process’ threads is executing there. From the PDPT, one can find a PDPE (E standing for Entry), which will lead to the physical address of a Page Directory. From the Page Directory, one can find a PDE pointing to a Page Table. From the Page Table, one can find a PTE and finally the physical page of memory where the information resides. Once we have found the physical page, we can, at long last, get the content at the physical address using the offset of that address from the start of the process’ virtual address space.

But how do we account for this massive four levels of indirection efficiently? How do we go from table spanning 128 TB of virtual address space to a single address in physical memory? The answer is absolutely brilliant: We need nothing more than the virtual memory address itself! By assigning specific portions of the address to specific indices in each stage, we can go down this entire chain with just one number. I can’t help but mention that this was (in my opinion) the single coolest thing I learned in the entire jam. I’ve attached an image from Windows Internals showing this breakdown for x86 PAE virtual addresses (hopefully this won’t cause any DMCA notices to be sent to HMN). There is also a fantastic overview of virtual memory, which includes discussing this translation in particular (in a more general sense) linked in a previous post in this project.

So, we can now translate a virtual address to a physical address, assuming the virtual address indeed resides in physical memory. But we already know it may very well not. It might be a file on disk, or it might be a page that has been sent out to the page file. When this happens, it is called a page fault. When a page fault occurs, the memory manager is summoned to deal with the problem. These faults can be as easy as giving the process another page of memory or expanding a thread stack, to so catastrophic it will crash the system. I’ve attached a table of the possible faults and their consequences to this post. Since these faults can occur for many reasons, the “page fault” counts in tools like Process Explorer can get quite high (Discord was sitting over one million after around 40 minutes of uptime on a system with ample physical memory). This is not necessarily something to be alarmed about, and does not indicate resource starvation. Most of these faults are harmless and occurring by design.

Now that we know how and when this translation is carried out, let’s look at how Windows manages physical memory.

Memory Management in Windows - Part 1: Virtual Memory

Windows implements virtual memory via pages. A page is (typically) a 4 kilobyte [https://devblogs.microsoft.com/oldnewthing/20210510-00/?p=105200] chunk of memory, though on some architectures it is larger [https://devblogs.microsoft.com/oldnewthing/20040908-00/?p=37923]. Pages in a virtual address space can be in one of four states at any given time: free, reserved, committed, or shareable.

Free, as the name suggests, means that the page is not currently in use and is available for reservation and committal. Reserved pages have been earmarked for use, but are not yet backed by physical memory or the page file (together referred to as the “commit charge”, more on that when we discuss physical memory). Committed, often called private, pages are private to their process and allocated in physical memory or the page file. Finally, shareable pages are also allocated to physical memory or the page file, but these may be shared between processes. It is important to note that they are are not so by default, however. This is often noted by the distinction between shareable and shared.

Windows works on a demand-paged basis, which effectively means it isn’t going to give anything (even itself) any committed or shareable pages until it absolutely must do so. There are a few ways it finds out when this must be done, such as special guard pages present in a thread’s stack. Initially, a thread stack is just one page, with the next page being reserved and marked as a guard page. The number of pages allocated to the stack grows as this guard page is accessed. Likewise, when reserved memory is first accessed, a demand-zero fault is triggered, signaling that a reserved page is now in use and must be committed. When a page is committed, it is always in a zeroed state for anything in user mode. Occasionally, a non-zeroed page is given to something in kernel mode so as not to reduce the number of zeroed physical pages (more on this later).

While free, reserved, and committed pages are somewhat self-explanatory once one is familiar with the terminology, shareable pages are more complex. While it’s fairly easy to envision a programmer designating some portion of their virtual address space as a shared scratch space for multiple processes, much of shared memory is devoted to section objects. Section objects, or file mapping objects as they are called in Win32, are shareable portions of memory that either reside in physical memory, the page file, or in a file on disk. It’s important to note that a section object is not necessarily shared, as it may only be mapped into one process’ address space. Section objects backed by files on disk are called mapped files, and those backed by memory are called page-file-backed sections (even if there is no page file enabled). Pieces of files can be mapped selectively using a view, specified via a range.

This mapped file mechanism is obviously powerful, and quite heavily used. It allows many applications to take advantage of shared resources (DLL files for example), in addition to accessing their own image. However, it has an interesting side effect when it comes to getting a sense for the “true” memory usage of a process. Measuring private memory is often sufficient, but it may not tell the full story. If a process is the only process using a DLL, is that not part of the “memory usage” of that process? That shareable memory will go unaccounted for in the general “memory usage” section of most tools (task manager, for example). No doubt the easily obtainable measurement of memory private to a process is sufficient in most cases.

“Large” pages are also available to processes which support them. A large page is 2 megabytes on x64. Large pages have a few restrictions that normal pages do not: They must be reserved and committed in one operation, are always present in physical memory (there’s no file system support for paging large pages to disk), and are not considered part of the working set. They are, however, accounted for as part of the process’ private bytes. There are also huge pages, 1 gigabyte in size, available with newer versions of Windows 10. Per Windows Internals, huge pages are given automatically when more than 1 gigabyte of large pages is requested. It also appears to be possible to request huge pages explicitly via VirtualAlloc2.

Sources:
Mark Russinovich’s talk on virtual memory: https://www.youtube.com/watch?v=TrFEgHr72Yg.
Windows Internals 7th Edition, Part 1

!til &windowsmm Page tables are a big deal. Such a big deal, in fact, that there are special caches and hardware devoted entirely to working with them. Also, you can have page table entries that point to other page tables, so you can page out page tables. This video series has been tremendously helpful and I'd highly recommend it to anyone trying to grok virtual memory. https://www.youtube.com/watch?v=qcBIvnQt0Bw&list=PLiwt1iVUib9s2Uo5BeYmwkDFUh70fJPxX