Why is Letting the OS Clean Up Resources Faster?

https://devblogs.microsoft.com/oldnewthing/20120105-00/?p=8683

In the above article by Raymond Chen, he mentions how when a process exits, it is better to let the OS just clean up the resources, as opposed to manually calling free on it. He claims that an application he regularly uses takes minutes at nearly 100% cpu usage just to delete every last byte of memory manually. I know that Windows will do the work of updating its internal structures to indicate pages/handles used by the process as free on process exit, but I am unsure as to why this would be any faster than manually freeing the memory.

If Windows needs to free the memory instead of us, isn't it still going to take a similar amount of time to do the same operations that we would do manually. For example, wouldn't it still need to go through our windows handle table and delete all the outstanding Windows objects, like files and windows. Similarly, if allocating with VirtualAlloc won't it still need to go through all the pages in our processes page table and mark them as not-in-use. I may be misunderstanding, but Raymond Chen's article makes it sound like the savings by just calling exit can be huge (for processes that allocate a lot), but I don't see why that is.

My thought is this may matter if you have pretty complicated data structures, which you need to traverse to free memory e.g. a simple example would be a 2D array or if you make heavy use of a Heap as opposed to big block allocations, like arenas. I suppose things like closing Handles matters, since it requires multiple calls into the Kernel to manually free them one by one vs just letting the kernel do the cleanup of every handle in ExitProcess.

Edited by Kidzor on
lets take a worst case scenario:

A big RAM hungry process that has been up for most of the day and internally uses malloc with a lot of small allocations in a big spiderweb of an object graph. And because you don't have quite enough ram most of the virtual memory is paged out.

Now for a process to clean that up in the "traditional" way it needs to visit most of those allocation to find more objects to free. This will mean that those pages that will end up freed get paged in and pulled into cache for no reason other than to free it. Don't forget that some implementations of free() will write to the location freed as well to put it in a data structure so it can figure out what to release from the backing memory.

If on the other hand you let the OS clean that up on process exit it only needs to mark those pages in the page file as available which it can do without even touching those pages.

There is a middle ground if you structure you program in a way that you can bulk free then you don't need to read the memory and only VirtualFree the big blocks of memory.
An operating system can just remap memory pages to extend a continuous space of virtual memory addresses for each program's heap of non-shared allocations. Then just de-allocate the owned page tables in the page directory recursively to remove the heap one large block at a time without iterating over each tiny allocation.

Using an arena allocator can make it fast in release mode by just linking to a reusable memory allocator, but the point of traversing every node is to verify that you don't have any data structure that leaks memory and eventually crashes. For the user experience, one can be very responsive by closing the window before the application has begun cleaning up the other resources.
Getting told to shut down a process is kind of like getting told to chop down a tree. Making a single cut at the stump of the tree is a lot faster then cutting off every single leaf followed by all the twigs and branches.

So what Raymond Chen is getting at, is that the reason the OS tells you the process is shutting down, is so you can tie some ropes and yell timber.

Edited by Koen on