The 2024 Wheel Reinvention Jam just concluded. See the results.

Question about calling convetions and registers

I'm currently learning about calling conventions on windows, both x64 and x86. As far as I know, calling conventions indicates how you pass the arguments, get the returns value, and who will clean up the stack frame.

Most x86 conventions push the arguments on the stack from right to left (except for __fastcall that uses the eax, ecx, and edx registers), and the return value will go to the eax registry if it's 32 bits or smaller (or the edx:eax pair if it's 64 bits). If the return value is greater than 64 bits, then the caller would need to pass in a hidden pointer that the callee will fill the result in. Most of the time, the callee cleans up the stack (except for __cdecl).

The x64 calling convention looks like __fastcall that works with 64-bit registers. My question is, after reading through this page, it says that: the rax, rcx, rdx, and r8-r11 are volatile, while the rbx, rbp, rdi, rsi, and r12-r15 are nonvolatile. What does volatile means in this case? Also, am I understanding the idea of calling convention correctly?

Volatile/non-volatile is explained in the calling convention: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#callercallee-saved-registers "Consider volatile registers destroyed on function calls unless otherwise safety-provable by analysis such as whole program optimization."

Volatile means that the caller stashes the data in the stack before it is considered deallocated when a call is made, which is useful for local temporary variables in code blocks that rarely have to stash it for a call. Non-volatile means that the callee is responsible for preserving the content, because it is seen as a persistent register across function calls, which is useful for passing things by reference.

So a function will copy all the non-volatile register content to the stack at the beginning and restore it at the end. While copying and restoring volatile registers when it calls other functions? I assume it saves the register's content by pushing it to the stack. Most registers used for argument passing (and RAX for the return value) are considered volatile. If these registers get destroyed on function calls, how do they pass argument or take return value?


Replying to Dawoodoz (#26994)

So a function will copy all the non-volatile register content to the stack at the beginning and restore it at the end. While copying and restoring volatile registers when it calls other functions?

It will copy only those non-volatile registers which it wants to use after function. If caller does not have need to use register afterwards, then there's no need to save it.

If these registers get destroyed on function calls, how do they pass argument or take return value?

Contents of register don't get automatically destroyed by calling function. The point of assuming the values are destroyed is because function will modify them, because they are defined to be volatile. If function does not modify, no problem. But caller does not know that. That's why it must assume values in volatile registers are changed after calling function. It's just part of compiler where it knows which registers it can use it codegen to store values or which it cannot (and need to save on stack around function calls).

Return value location is also part of ABI calling conventions. On x64 integers/pointers go into rax, floats into xmm0. Larger values to into stack and is passed by address as extra argument. For MSVC you can read about them in the same link: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#return-values


Edited by Mārtiņš Možeiko on
Replying to longtran2904 (#26995)

It will copy only those non-volatile registers which it wants to use after function. If caller does not have need to use register afterwards, then there's no need to save it.

But how can a callee know which registers the caller used? Also, why does it do the copy "after function"? What does the "function" mean: the caller or the callee?

Contents of register don't get automatically destroyed by calling function. The point of assuming the values are destroyed is because function will modify them, because they are defined to be volatile. If function does not modify, no problem. But caller does not know that.

But doesn't calling a function mean: the caller need to pass arguments to the registers? Which means the previous value in the register has been destroyed.

Return value location is also part of ABI calling conventions. On x64 integers/pointers go into rax, floats into xmm0. Larger values to into stack and is passed by address as extra argument.

What about arguments that are bigger than 64 bits? Are they getting pushed to the stack, and the caller only needs to pass a pointer to it?


Replying to mmozeiko (#26996)

Callee does not know and does not care which registers caller uses. Caller itself knows which registers (volatile ones) it uses and saves them by putting them to stack before call, and restoring afterwards.

When you call function you know which registers you (as caller) need to use to pass arguments. That means you (as caller) know that you cannot store other variables you use in this registers. It is your (callers) responsibility to generate code that will move values to registers that are expected for argument and not use them for other purposes at call site.

Basically caller saves volatile registers to stack or to non-volatile ones if it expects to use values in them after call. And callee must save non-volatile ones if it modifies them.

Here's simple example in pseudocode - it assumes r0 is where you pass argument to function (volatile register), and r1 is non-volatile register:

mov r0, 5     ; put 5 in r0, think of 5 coming from some
              ; other place somewhere in code above
mov r1, r0    ; save value 5 in r1
call Foo      ; calls function and passes argument int r0
add r1, ...   ; can use r1, because it was saved by mov above

In this example "compiler" decides to store value 5 in r1 because it wants to use it later after function. It cannot use r0 because that is used for argument as is non-volatile so called function maybe modified it.

And in case function Foo is big and wants to modify r1 register, it will need to save, for example on stack:

Foo:
  push r1      ; save r1 to stack
  mov r1, 10   ; put 10 in r1
  ...          ; rest of function that can modify r1 as much as it wants
  pop r1       ; restore r1
  ret          ; and return

Arguments bigger than 64-bit's are done in way ABI specifies. Sometimes you put them in stack and passes pointer (MSVC does this). Sometimes you split them into multiples of 64-bit and pass in multiple registers (SysV does this on x64). Check the ABI specs for calling convention - they explain exactly what are rules and how to pass larger values.


Edited by Mārtiņš Možeiko on
Replying to longtran2904 (#26997)

Thanks for the detailed explanation. I think I get it now. I have some remaining questions:

  1. Does the x86 architect have any concept of volatile registers? I was reading through this page and didn't see anything related to it.
  2. What other things does an ABI specify (except for calling convention)?

Edited by longtran2904 on
Replying to mmozeiko (#26998)

volatile / non-volatile registers are concept of calling convention specified in ABI. It is simply agreement between caller and callee how to communicate. CPU itself does care about it, all registers to it are equal.

ABI specifies many other things - sizes of C types, alignment of C types, stack frame layout, stack alignment, initial state of stack/registers when entry point is called, TLS, relocations, memory model, unwinding, exceptions.


Replying to longtran2904 (#26999)

volatile / non-volatile registers are concept of calling convention specified in ABI. It is simply agreement between caller and callee how to communicate. CPU itself does care about it, all registers to it are equal.

I meant I couldn't see any volatile-related topic on the x86 calling convention page. So is, for example, the eax register volatile (according to any x86 calling conventions)?

ABI specifies many other things - sizes of C types, alignment of C types, stack frame layout, stack alignment, initial state of stack/registers when entry point is called, TLS, relocations, memory model, unwinding, exceptions.

What is a "stack frame"? Is cleaning a stack frame expensive? From what I saw in the dissembler, the callee "cleans up" the stack frame by doing something like ret n, which will pop the stack down n bytes (I think n is usually the size of all arguments). If the caller needs to clean the stack, it would call sub rsp, n, while the callee just ret 0. So in both cases, the callee always needs to have a ret instruction.


Edited by longtran2904 on
Replying to mmozeiko (#27001)

volatile registers are listed here:

Integer arguments are passed in registers RCX, RDX, R8, and R9. Floating point arguments are passed in XMM0L, XMM1L, XMM2L, and XMM3L. 16-byte arguments are passed by reference. Parameter passing is described in detail in Parameter passing. These registers, and RAX, R10, R11, XMM4, and XMM5, are considered volatile, or potentially changed by a callee on return.

Stack frame is layout of stack when function is called. Where arguments will be placed, where return address will be placed, any extra space to allocate (shadow store), and alignment requirements, where to place local/temporary variables.

As to who cleans up stack is up to ABI- it's different for different calling conventions - sometimes it is caller, sometimes it is callee.


Edited by Mārtiņš Možeiko on
Replying to longtran2904 (#27002)

What is a shadow store? Is it for return values that are greater than 64 bits so the callee can write the result value?

What are local/temporary variables? Do you mean variables you declare on the stack?


Replying to mmozeiko (#27004)

Shadow store is explained in here: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160#calling-convention-defaults

The caller must always allocate sufficient space to store four register parameters, even if the callee doesn't take that many parameters.

That space for parameters is "shadow store" or "shadow space". It is used for implementing varargs functionality.

"local/temporary" variables is space on stack where compiler decided to spill registers. They do not always match variables in your code. In debug builds they will, but in optimized builds it depends on situation - compiler optimizes code by placing variables into registers as much as possible.


Edited by Mārtiņš Možeiko on
Replying to longtran2904 (#27005)

In the "Calling convention defaults" section:

The caller must always allocate sufficient space to store four register parameters, even if the callee doesn't take that many parameters. This convention simplifies support for unprototyped C-language functions and vararg C/C++ functions.

Any parameters beyond the first four must be stored on the stack after the shadow store before the call

While in the "Parameter passing" section:

The callee is responsible for dumping the register parameters into their shadow space if needed.

So can I understand it like this: Sometimes, functions need to dump their register parameters on the stack, and the space for that is called "shadow space" which gets allocated by the caller? If that's the case, then:

  1. Why it's the caller's job and not the callee's job? It makes sense for unprototyped functions and vararg, but not for prototyped ones. Can't the compiler look at the current function and know if it needs the shadow space?
  2. What does a, for example, vararg function need to do that it needs to dump register parameters onto a preallocated space on the stack by the caller?

Another thing that I don't understand is in the "Varargs" section:

It's the callee's responsibility to dump arguments that have their address taken.

What does "to have their address taken" mean?


Edited by longtran2904 on
Replying to mmozeiko (#27006)

Sometimes, functions need to dump their register parameters on the stack, and the space for that is called "shadow space" which gets allocated by the caller?

Yes

Why it's the caller's job and not the callee's job?

Because if callee does not need to access them then it would be waste of time for caller to do it on every call. And caller does not know what callee will actually do, thus it delegates to callee to dump registers if it really wants that.

Can't the compiler look at the current function and know if it needs the shadow space?

It could "look at", but that's what not ABI requires. ABI requires it because you can call C functions without knowing prototype thus caller does not know if it is calling varargs, or non-varags function. Before C99 it was completely legal to call something like printf(...) or strcmp(...) without including any headers.

What does a, for example, vararg function need to do that it needs to dump register parameters onto a preallocated space on the stack by the caller?

To move register to stack, you simple "mov" it. Nothing special.

It's the callee's responsibility to dump arguments that have their address taken.

If you write &arg in C code and then use pointer for some other operations then you might need address of this argument to put in opcodes (for load or store). If argument was passed in register, then you cannot take address of it directly. There's no such pointer that encodes location of register. Thus you put int stack, and then you have actual pointer to memory.


Edited by Mārtiņš Možeiko on
Replying to longtran2904 (#27007)

Because if callee does not need to access them then it would be waste of time for caller to do it on every call. And caller does not know what callee will actually do, thus it delegates to callee to dump registers if it really wants that.

But isn't the caller need to allocate the space, not the callee?

If you write &arg in C code and then use pointer for some other operations then you might need address of this argument to put in opcodes (for load or store). If argument was passed in register, then you cannot take address of it directly. There's no such pointer that encodes location of register. Thus you put int stack, and then you have actual pointer to memory.

This seems to be a very common thing, and not specific to varargs at all. Why does the document specifically say it in the Varargs section?


Replying to mmozeiko (#27009)