Call Stack in C++ (ENG)

The Call Stack

The call stack, also known as the execution stack, and is often shortened to simply the "stack". Which is a fundamental component in how your program runs. It refers to a particular segment of memory within a process's virtual memory layout, used to manage function calls and their execution context. Before going deep into this topic, it's important to understand the role stack plays in a process's memory layout. And if you're unfamiliar with it, I've linked some resources above for you to check out.

To put this simply: your program runs on the stack. Say, when a function is called, the system sets aside some space in the stack memory for the function to do its necessary work. These chunks of space or memory are called "stack frame" or "function frame". The call stack's primary purpose is to manage how functions call one another and how parameters are passed between them.

Each thread has its own call stack, also referred to as stack memory.

Although this data structure is normally abstracted away by the high-level programming languages, understanding its structure and behavior is still quite useful, especially when exploring some advanced features like RVO (Return Value Optimization) and copy elision in C++. These optimization offered by compiler is often involving stack frames and function return mechanics.

How Function Calls Work on the Stack

Firstly, let's start by understanding how a function stack frame is formed and what it contains. In the code below, we define a function called add, which is then called inside the main function. So what happens during the execution? Since there are two functions involved, the system creates two stack frames (one for the main function and another for add). Because the stack is a LIFO data structure, you can imagine how these function frames are constructed and then destroyed in reverse order.

int add(int x, int y){
	return x + y;
}
int main(){
	int a = 32;
	int b = 64;
	int sum = add(a, b);
}

Here's what happens step by step:

  1. The main() function is called first, and its function frame is created.
  2. The add() function is called by main(), and its function frame is pushed onto the stack.
  3. After the add() function finishes executing, its function frame is popped off the stack, and control returns to main().
  4. Once main() finishes, its stack frame is removed, and the program terminates.

call_stack.png

Stack Frame Under the Hood

We now understand how stack frames are pushed and popped. Next, let's delve into what is inside the stack frame. We'll compile the code above into assembly language to examine the details. For this purpose, I'll use GCC with the -m32 option here to generate 32-bit machine assembly code, because on a 64-bit machine, parameters are typically passed using registers rather than the stack.

Here's the assembly:

_Z3addii:
    pushl   %ebp
    movl    %esp, %ebp
    movl    8(%ebp), %edx
    movl    12(%ebp), %eax
    addl    %edx, %eax
    popl    %ebp
    ret
main:
    pushl   %ebp
    movl    %esp, %ebp

	subl    $16, %esp

	; Construct local variables in the stack frame
    movl    $32, -12(%ebp)
    movl    $64, -8(%ebp)

	; Push local variable as argument for function call
	pushl   -8(%ebp)
    pushl   -12(%ebp)

	; Call add() function
    call    _Z3addii
    addl    $8, %esp
    movl    %eax, -4(%ebp)   ; <- Next instruction address
    movl    $0, %eax
    leave
    ret

Okay, in assembly, there are two important registers help define the boundaries of a stack frame: the base pointer register and the stack pointer register.

  • The base pointer register (ebp in this case) always point to the bottom of the current stack frame.
  • The stack pointer register (esp in this case) always points to the top of the stack frame..

Because the stack grows from high memory addresses to low memory addresses, esp

Before calling add(), main() forms its stack frame using ebp and esp. It first saves the old base pointer using a pushl instruction, and then sets the new ebp to point to the old ebp, creating the base of the stack frame. After that, space is allocated on the stack for local variables. Then, the passing parameters are pushed, and add() is called.

When the call instruction is executed, a return address is automatically pushed into the stack frame. In the add() function, a new stack frame is formed using pushl %ebp and movl %esp, %ebp. After the calculation is done, the return value is passed back via eax. The frame is then cleaned up with popl %ebp.

After the frame is cleaned up, the return address is instantly passed to eip, which points to the next instruction address of call _Z3addii. The stack space allocated for passing parameters is then cleaned up. Finally, the return value 0 is set, and the stack is further cleaned up.

Copy and Reference

From the above, we have seen that passing parameters involves copying them to the stack (32-bit machine). If the object is large, this copying can be quite costly in terms of CPU time. To mitigate this, we can pass a pointer or a reference to the object instead, making the copy much smaller. You can consider references in C++ as syntactic sugar for pointers, they act the same way.

For large objects, it's always recommended to pass it by a reference. You might be concerned that directly operating on the original object could lead to unintended modifications. To avoid this, you can use the const keyword to ensure the object is not altered. This approach is exactly what we use in a copy constructor and copy assignment operator.

Pass by Value or Pass by Const Reference?

It is generally recommended to pass by const reference. On modern 64-bit machines, a pointer requires 8 bytes to store. Whether you use pass-by-pointer or pass-by-reference, you will always copy an 8-byte pointer to somewhere, typically a register (or the stack when there are many parameters).

When you pass a easy type no bigger than 8 bytes (e.g., int, which typically stores 4 bytes on most machines) by value, you only copy 4 bytes. However, if you use a pointer or reference, it would take 8 bytes. In this case, will pass this type no bigger than 8 bytes by value be better?

It looks so, because you copy 4 bytes must be two times faster than copy a 8 bytes pointer. But in most cases, these parameters will be passed to a register (the rule is in the next section). Copying a type no larger than 8 bytes to a register doesn't make much of a difference in terms of performance. But it's a good practice to do so.

It seems so, because copying 4 bytes must be two-times faster than copying an 8-byte pointer, right? However, in most cases, these parameters will be passed to a register. Copying a type no larger than 8 bytes to a register doesn't make much of a difference in terms of performance. But it's a good practice to do so.

Parameter Passing Rules in 64-Bit Machine

On 64-bit machines, function parameters are usually passed through specific registers, and only when the number of parameters exceeds the available registers do parameters get passed on the stack. Here are the typical rules for x86-64 (System V ABI):

  1. Integer and Pointer Parameters:
    Passed through registers: RDI, RSI, RDX, RCX, R8, R9 (for the first six parameters).
    Any additional parameters are passed on the stack.
  2. Floating-point Parameters:
    Passed through registers: XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7 (for the first eight parameters).
    Any additional parameters are passed on the stack.

For custom complex types:

  • If passed by value and they do not fit into available registers, they are stored on the stack.
  • If passed by reference (pointer), the reference itself is passed through a register if there are enough available.

Following these rules ensures efficient and consistent parameter passing, helping to optimize function calls.