https://cdn.discordapp.com/attachments/985018469907038209/1221637785962676345/image.png?ex=66134dd9&is=6600d8d9&hm=03acfe6bffd63753468a710743e35ca2f154ddd738640f2e9d2fb8e7fe8b4c5f&
https://cdn.discordapp.com/attachments/985018469907038209/1221637534124347474/image.png?ex=66134d9d&is=6600d89d&hm=bb31ec02e9892db438d2ea7b2218b1e09914f0cf970d7b6807d12abbc2ec50cc&
Minimal D3D11 sprite renderer NEO:
Ultra-compact sprite rendering code with example frame animation logic
This release contains tech bits from the upcoming SuperNeo™ 2D game engine and includes rotation, anchor/pivot point, color filtering, alpha blending and built-in antialiased point sampling. As usual: complete, runnable single-function app. ~150 LOC. No modern C++, OOP or (other) obscuring cruft.
Next steps are to make state server-authoritative, implement client-side prediction, rollback, and interpolation. This has been a great jam. I learned a ton and have prototype code to build off of (cough cough refactor cough). Thank you @bvisness and all of the admin team for setting everything up!
]]>These two clients are running on the same Windows machine, but they're exchanging packets with a server in VA. The server forwards complementary packets to each client (client A gets client B's data, client B gets client A's data).
I can also throw in a third Linux client (A gets B and C data, B gets A and C data, C gets A and B data).
Next step is to hook this up to the renderer I've been working on and let some players run around.
]]>recvfrom
and sendto
. Linux sends and copies buffers of void *
, but Windows sends and copies buffers of char *
. Linux uses size_t
for the sizes of incoming and outgoing buffers; Windows uses int
. Separate UDP socket implementations for both OS's might be worthwhile. That way you could include one or the other based on preprocessor directives rather than having preprocessor conditionals sprinkled over one implementation.
]]>
Well, that's it, 15,000 characters worth of me trying to summarize how memory management works in Windows! There was a ton that I didn't cover. Copy-on-write, hardware permissions on physical pages, translation buffers and caches, and much more. My hope with the below is that it has piqued your interest and made you want to learn more about this fascinating OS. I know there are some very knowledgeable folks around here, so if there's something I've got wrong or misunderstood, please don't hesitate to let me know.
Thank you so much to Ben and Asaf for putting the jam together and helping with some technical difficulties I had when making these posts. And thank you to the entire HMN community, for being such welcoming and encouraging folks. If anyone finds this useful, please let me know here or on Discord. Cheers!
]]>Just like virtual memory, processes also have a limited amount of physical memory, referred to as the process’ Working Set. There are also System and Session Working Sets, which contain subsets of pageable code and data scoped to those spaces. While it is possible for a process to specify a minimum and maximum working set size, these limits are almost always ignored in practice. The default minimum and maximum per process is 50 and 345 pages, respectively. You can self-impose a hard cap on maximum working set if you’d like to do that for some reason. Without these self-imposed restrictions, working sets will grow or shrink beyond these limits as demand requires and resources allow.
When memory is getting low, working sets are trimmed by the working set manager, taking into account a number of factors including minimum working set sizes and the last time pages were accessed. Page faults under these conditions that require another page from the working set are handled by replacing pages on the working set and paging out old ones, rather than allocating new pages. It is possible to manually trigger the trimming of a process’ working set at anytime by setting the maximum working set size while the program is running. When memory is plentiful, the working set manager instead spends its time calculating how many pages could be removed from working sets should the need ever arise.
On a system-wide scale, the total memory available for allocation is referred to as the “system commit limit.” This comprises the sum of all physical memory available to the system plus the size of any page files. It is possible to run Windows with no page file at all (though this is almost universally discouraged), in which case the commit limit is simply the amount of physical memory available to Windows. Note that it is likely that this number does not coincide exactly with the amount of physical memory installed, as certain hardware reserves memory for itself independent of the operating system.
Any memory allocated against the system commit limit is considered the “system commit charge” and represents everything that must be kept either in RAM, or in the page file (so basically, anything that isn’t a file on disk other than the page file). When the system commit limit is reached, the system will attempt to increase the size of the page file. If this does not succeed (or, seemingly, if it can’t be carried out fast enough), memory allocations will fail. Each process also has a process page file quota, which tracks its contributions to the system commit charge. It’s worth noting that the commit charge and process page file quotas reflect the maximum theoretical, rather than actual, usage. Windows will not allocate any memory that it could not actually provide if necessary, even if many of those allocations have not and may never take place.
As you can imagine, pages added to a process’ working set are not chosen at random (well, they kind of are, but ASLR is another topic). Windows keeps track of every physical page of memory in the Page Frame Number database. These pages can be in one of nine states: Free, Zeroed, Modified, Modified No-Write, Standby, Transition, Active, Rom, or Bad. Active, or Valid, pages are either part of a working set or in use by some other means and typically have a valid PTE pointing to them. Transition pages are currently undergoing I/O, not part of a working set, and not on a page list. Modified no-write is a special case of the modified state where the page won’t be written to disk. This state is only used by file system drivers in specific scenarios. The other six states’ pages are each tracked in their own linked list for quick access by the memory manager.
Every page in the system starts out on the free page list, and returns there when it is no longer in use. These free pages are zeroed by the zero page thread and placed on the zero page list. Memory is (typically) pulled from the zero page list into a working set. When memory is trimmed from a working set, it either goes onto the modified or standby list. Modified pages have been changed in some way since they were last written to disk, and therefore must have their contents saved by the modified page writer before becoming standby pages. Standby pages can be reused immediately by whatever was using them previously, because their content has not changed since the last time it was written to disk. Rom pages are read-only, and bad pages have failed a consistency check should not be used. Most new allocations happen from the zero page list. Kernel mode processes are permitted to pull directly from the free page list in some cases, so long as the memory’s content has been overwritten before it makes it to user mode.
]]>If you try it I'd be very interested in your feedback, which you could share right here, or in a DM, or on the Prizm Discord.
]]>Note: We’re going to have to break this into three parts, not two, because I wrote so much on the address translation piece I’ve hit the character limit for HMN posts. And I didn't even get to talk about TLBs or anything!
While virtual memory address spaces can be as large as the implementation supports, 128 terabytes on modern Windows [https://learn.microsoft.com/en-us/windows/win32/memory/memory-limits-for-windows-releases#memory-and-address-space-limits], physical memory is limited by how much we’ve put in the computer. Windows Server 2022 supports up to 48 terabytes of RAM [https://www.microsoft.com/en-us/windows-server] (oh the things I would do for access to a Windows server like that!), but most of us will likely be working with much less. So, how is 128 terabytes of virtual address space per process (and that’s just the user mode portion, there’s another 128 terabytes for kernel mode) squeezed into a comparatively minuscule amount of physical memory?
Let’s start by getting a feel for what the journey from virtual to physical memory looks like. This process is called translation and it is so performance-critical that special hardware has been invented to assist in carrying it out [https://en.wikipedia.org/wiki/Memory_management_unit]. A page of virtual memory’s location in physical memory is noted within a page table, in a page table entry or PTE. These page tables and PTEs must be in a specific format to work with the available hardware. Page tables themselves can take up a fair chunk of physical memory, and this cost can add up quickly. Since a page table must always be resident in physical memory for the translation to work (it would be hard to find where a page is in physical memory if the thing that tells you where the page is isn’t there!), we would end up taking up a large amount of physical memory just for the page tables themselves. To get around this, many systems (Windows included) create page tables of page tables. That way, only the top-level page table need stay in memory.
The top-level page table in Windows is the Page Directory Pointer Table (PDPT). The PDPT is always guaranteed to be resident in physical memory, and its physical address is stored within the process’ data structures and loaded into a processor register whenever one of the process’ threads is executing there. From the PDPT, one can find a PDPE (E standing for Entry), which will lead to the physical address of a Page Directory. From the Page Directory, one can find a PDE pointing to a Page Table. From the Page Table, one can find a PTE and finally the physical page of memory where the information resides. Once we have found the physical page, we can, at long last, get the content at the physical address using the offset of that address from the start of the process’ virtual address space.
But how do we account for this massive four levels of indirection efficiently? How do we go from table spanning 128 TB of virtual address space to a single address in physical memory? The answer is absolutely brilliant: We need nothing more than the virtual memory address itself! By assigning specific portions of the address to specific indices in each stage, we can go down this entire chain with just one number. I can’t help but mention that this was (in my opinion) the single coolest thing I learned in the entire jam. I’ve attached an image from Windows Internals showing this breakdown for x86 PAE virtual addresses (hopefully this won’t cause any DMCA notices to be sent to HMN). There is also a fantastic overview of virtual memory, which includes discussing this translation in particular (in a more general sense) linked in a previous post in this project.
So, we can now translate a virtual address to a physical address, assuming the virtual address indeed resides in physical memory. But we already know it may very well not. It might be a file on disk, or it might be a page that has been sent out to the page file. When this happens, it is called a page fault. When a page fault occurs, the memory manager is summoned to deal with the problem. These faults can be as easy as giving the process another page of memory or expanding a thread stack, to so catastrophic it will crash the system. I’ve attached a table of the possible faults and their consequences to this post. Since these faults can occur for many reasons, the “page fault” counts in tools like Process Explorer can get quite high (Discord was sitting over one million after around 40 minutes of uptime on a system with ample physical memory). This is not necessarily something to be alarmed about, and does not indicate resource starvation. Most of these faults are harmless and occurring by design.
Now that we know how and when this translation is carried out, let’s look at how Windows manages physical memory.
]]><()
) it prints a path instead of going right to stdin (this is handy when you want to forward more than one command to another)❯ echo <(cat .gitconfig)/dev/fd/11]]>