The 2024 Wheel Reinvention Jam just concluded. See the results.

Why are DLL Functions Accessed via a Import Address Table instead of Load-Time Relocation

On both Windows and Linux, DLL function addresses are resolved through the use of a secondary table i.e. the IAT on Windows or the PLT/GOT on Linux. Now, on Windows this may be done via an indirect call instruction through the IAT; however, I was wondering why it couldn't instead just generate a direct call instruction that gets patched up by the loader when the DLL loads. Initially, I thought that it may be so the address only needs to be fixed up in one place, but even with the indirect call solution that Windows uses, you would need to fixup the address to the IAT entry for all those indirect calls, so that doesn't seem like it would be relevant. Basically, the question boils down to whether there is a technical reason for doing everything via some indirect table as opposed to just load-time relocation.

If I understand the question - are you suggesting OS loader to patch up all call sites instead? You don't want that because you want OS be able to share code segment between processes. As long as code pages are not modified, the OS can map same physical memory pages to virtual space in different processes. Saves memory.

I think what I am asking is a bit different. I was wondering more so why the code in the executables calling into the DLL (this doesn't need to be shared) calls into the DLL via a indirect pointer to a table, instead of just being patched by the OS on load.

I think what you were answering was why the Shared Library needs to make calls via a table, which is basically to facilitate Position Independent Code for physical memory sharing. I think for executables, the OS patches up absolute addresses regardless, since the executable can be loaded anywhere in the Virtual Address Space, so sharing between multiple instances of that executable is probably not possible anyway.


Edited by Draos on
Replying to mmozeiko (#25589)

I have a theory. I have not validated this theory. I might have some details wrong.

When I write 'image', I mean a PE image (EXE or DLL).

In PE, we have two types of patching in images: rebasing and imports.

Rebasing is happens if any of the following is true:

  • ASLR is enabled
  • the image's preferred virtual memory is already allocated by something else (e.g. another DLL or VirtualAlloc or whatever)

Rebasing is relatively expensive because it needs to touch pages all over the image.

Rebasing can be cached. I assume it is cached by Windows. I assume this caching is the sharing mmozeiko mentioned. In theory, the cache doesn't need to be evicted when a process exits; the cache can be reused for a new process, even if no existing process has the image loaded.

Separate from rebasing is the import table. Patching imports is relatively cheap because it doesn't need to touch pages all over the image.

Imports can be cached. I have no clue if Windows caches imports or not. I can imagine common cases where caching IAT might not be worth it. (The cache key for rebasing is image + base address. The cache key for imports is image + image and base address of every imported DLL.)


For the entry point EXE, consider two approaches for EXE-to-DLL calls: import table (what PE implements) and direct calls (your proposal) If we ignore ASLR and assume that the EXE rarely relocates, my mental model of the two approaches says the following:

  • The import table approach is faster to load and uses less physical memory but is slower to execute, and
  • The direct calls approach is faster to execute but slower to load and uses more physical memory.

Again, my comment is pure speculation. I haven't done any measurements or read any rationale from the designers of PE.