tags:
- Cpp
Call Stack in C++ (ENG)
Learn more at: 计算机系统基础(一):第七周过程调用概述中国大学MOOC(慕课)
Wanna learn more about process? Check out: 6. Processing The Processes.
The call stack, also known as the execution stack, and is often shortened to simply the "stack". Which is a fundamental component in how your program runs. It refers to a particular segment of memory within a process's virtual memory layout, used to manage function calls and their execution context. Before going deep into this topic, it's important to understand the role stack plays in a process's memory layout. And if you're unfamiliar with it, I've linked some resources above for you to check out.
To put this simply: your program runs on the stack. Say, when a function is called, the system sets aside some space in the stack memory for the function to do its necessary work. These chunks of space or memory are called "stack frame" or "function frame". The call stack's primary purpose is to manage how functions call one another and how parameters are passed between them.
Each thread has its own call stack, also referred to as stack memory.
Although this data structure is normally abstracted away by the high-level programming languages, understanding its structure and behavior is still quite useful, especially when exploring some advanced features like RVO (Return Value Optimization) and copy elision in C++. These optimization offered by compiler is often involving stack frames and function return mechanics.
Firstly, let's start by understanding how a function stack frame is formed and what it contains. In the code below, we define a function called add
, which is then called inside the main
function. So what happens during the execution? Since there are two functions involved, the system creates two stack frames (one for the main
function and another for add
). Because the stack is a LIFO data structure, you can imagine how these function frames are constructed and then destroyed in reverse order.
int add(int x, int y){
return x + y;
}
int main(){
int a = 32;
int b = 64;
int sum = add(a, b);
}
Here's what happens step by step:
main()
function is called first, and its function frame is created. add()
function is called by main()
, and its function frame is pushed onto the stack.add()
function finishes executing, its function frame is popped off the stack, and control returns to main()
.main()
finishes, its stack frame is removed, and the program terminates.We now understand how stack frames are pushed and popped. Next, let's delve into what is inside the stack frame. We'll compile the code above into assembly language to examine the details. For this purpose, I'll use GCC with the -m32
option here to generate 32-bit machine assembly code, because on a 64-bit machine, parameters are typically passed using registers rather than the stack.
Here's the assembly:
_Z3addii:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
movl 12(%ebp), %eax
addl %edx, %eax
popl %ebp
ret
main:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
; Construct local variables in the stack frame
movl $32, -12(%ebp)
movl $64, -8(%ebp)
; Push local variable as argument for function call
pushl -8(%ebp)
pushl -12(%ebp)
; Call add() function
call _Z3addii
addl $8, %esp
movl %eax, -4(%ebp) ; <- Next instruction address
movl $0, %eax
leave
ret
Okay, in assembly, there are two important registers help define the boundaries of a stack frame: the base pointer register and the stack pointer register.
ebp
in this case) always point to the bottom of the current stack frame.esp
in this case) always points to the top of the stack frame..Because the stack grows from high memory addresses to low memory addresses, esp
Before calling add()
, main()
forms its stack frame using ebp
and esp
. It first saves the old base pointer using a pushl
instruction, and then sets the new ebp
to point to the old ebp
, creating the base of the stack frame. After that, space is allocated on the stack for local variables. Then, the passing parameters are pushed, and add()
is called.
When the call
instruction is executed, a return address is automatically pushed into the stack frame. In the add()
function, a new stack frame is formed using pushl %ebp
and movl %esp, %ebp
. After the calculation is done, the return value is passed back via eax
. The frame is then cleaned up with popl %ebp
.
After the frame is cleaned up, the return address is instantly passed to eip
, which points to the next instruction address of call _Z3addii
. The stack space allocated for passing parameters is then cleaned up. Finally, the return value 0
is set, and the stack is further cleaned up.
From the above, we have seen that passing parameters involves copying them to the stack (32-bit machine). If the object is large, this copying can be quite costly in terms of CPU time. To mitigate this, we can pass a pointer or a reference to the object instead, making the copy much smaller. You can consider references in C++ as syntactic sugar for pointers, they act the same way.
For large objects, it's always recommended to pass it by a reference. You might be concerned that directly operating on the original object could lead to unintended modifications. To avoid this, you can use the const
keyword to ensure the object is not altered. This approach is exactly what we use in a copy constructor and copy assignment operator.
It is generally recommended to pass by const reference. On modern 64-bit machines, a pointer requires 8 bytes to store. Whether you use pass-by-pointer or pass-by-reference, you will always copy an 8-byte pointer to somewhere, typically a register (or the stack when there are many parameters).
When you pass a easy type no bigger than 8 bytes (e.g., int
, which typically stores 4 bytes on most machines) by value, you only copy 4 bytes. However, if you use a pointer or reference, it would take 8 bytes. In this case, will pass this type no bigger than 8 bytes by value be better?
It looks so, because you copy 4 bytes must be two times faster than copy a 8 bytes pointer. But in most cases, these parameters will be passed to a register (the rule is in the next section). Copying a type no larger than 8 bytes to a register doesn't make much of a difference in terms of performance. But it's a good practice to do so.
It seems so, because copying 4 bytes must be two-times faster than copying an 8-byte pointer, right? However, in most cases, these parameters will be passed to a register. Copying a type no larger than 8 bytes to a register doesn't make much of a difference in terms of performance. But it's a good practice to do so.
On 64-bit machines, function parameters are usually passed through specific registers, and only when the number of parameters exceeds the available registers do parameters get passed on the stack. Here are the typical rules for x86-64 (System V ABI):
For custom complex types:
Following these rules ensures efficient and consistent parameter passing, helping to optimize function calls.