I have a theory. I have not validated this theory. I might have some details wrong.
When I write 'image', I mean a PE image (EXE or DLL).
In PE, we have two types of patching in images: rebasing and imports.
Rebasing is happens if any of the following is true:
- ASLR is enabled
- the image's preferred virtual memory is already allocated by something else (e.g. another DLL or VirtualAlloc or whatever)
Rebasing is relatively expensive because it needs to touch pages all over the image.
Rebasing can be cached. I assume it is cached by Windows. I assume this caching is the sharing mmozeiko mentioned. In theory, the cache doesn't need to be evicted when a process exits; the cache can be reused for a new process, even if no existing process has the image loaded.
Separate from rebasing is the import table. Patching imports is relatively cheap because it doesn't need to touch pages all over the image.
Imports can be cached. I have no clue if Windows caches imports or not. I can imagine common cases where caching IAT might not be worth it. (The cache key for rebasing is image + base address. The cache key for imports is image + image and base address of every imported DLL.)
For the entry point EXE, consider two approaches for EXE-to-DLL calls: import table (what PE implements) and direct calls (your proposal) If we ignore ASLR and assume that the EXE rarely relocates, my mental model of the two approaches says the following:
- The import table approach is faster to load and uses less physical memory but is slower to execute, and
- The direct calls approach is faster to execute but slower to load and uses more physical memory.
Again, my comment is pure speculation. I haven't done any measurements or read any rationale from the designers of PE.