**
Article directory
- Creation and destruction of function stack frames
-
- What are registers?
- Register classification
- Register purpose
- The concept of function stack frame
- The process of function stacking
-
- Sample code and main function assembly instructions (part)
- Assembly instruction: build function stack frame preparation (1)
- Assembly instruction: build function stack frame preparation (2)
- Assembly instructions: build the scope of the function stack frame
- Assembly instruction: put into three non-volatile registers
- Assembly instruction: load stack frame effective space
- Assembly instructions: generate function local variables
- Summary of this article
- Conclusion
- View the next article
—— POWERED BY CAIXYPROMISE
Creation and destruction of function stack frames
Through the previous study, we have learned the grammar and usage of the most basic C language program, but do you have any questions?
for example:
-
How is the scope of a function formed?
-
How are local variables created?
-
Why are the values of uninitialized local variables random or garbled?
-
How is the function passed parameters? What is the order of passing parameters?
-
What is the relationship between formal parameters and actual parameters?
-
How is the function call implemented?
-
How does the function return after the end of the call?
-
Why is there a maximum depth of function recursion? What does the stack overflow error raised by reaching the maximum depth mean?
When you understand the creation and destruction of function stack frames, these doubts will be solved one by one! With these questions in mind, let's enter the function stack frame!
Due to the length of the article, this series of articles is divided into two parts. This article is the first one, and will mainly introduce:
If you have any questions, please reply in the comment area. Click to view the next article immediately to learn the new knowledge of the next article as soon as possible.
- What are registers?
- What is a stack?
- The formation process of the function stack frame
- The process of forming function variables
Understanding the function stack frame requires disassembly operations, and the author will introduce it according to the relevant assembly instructions.
The compasses are turning forward, let’s get to the point!
What are registers?
First thing you need to know: what is a register?
In computer hardware, what is the hardware with storage function? They are hard disk --> memory --> cache (cache) --> registers, and the access speed and storage speed of the four of them are increasing from left to right; at the same time, their sizes are also decreasing from left to right , to the last register, its storage space may only have the size of a 4byte storage unit, and at the same time its access speed is the fastest, because the register is generally integrated on the CPU, and it is an independent storage space different from the memory. As the saying goes, if the network speed is fast, you are sitting on the server to play games, and the faster the reading speed is, you are sitting on the CPU to read. This is the reason why registers are read faster.
Register classification
There are many types of computer registers
-
General registers: EAX, EBX, ECX, EDX
ax: accumulation register, bx: base register, cx: count register, ed: data register
-
Index register: ESI, EDI
si: source index register, di: destination index register
-
Stack, base registers: ESP, EBP
sp: stack index register, bp: base index register; these two registers are also the two most important registers in the function stack frame
in:
- EAX, ECX, EDX, EBX: extensions of ax, bx, cx, dx, each 32 bits
- ESI, EDI, ESP, EBP: extensions of si, di, sp, bp, 32 bits each
- EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP, etc. are the names of general-purpose registers on the CPU in X86 assembly language, which are 32-bit registers.
Register purpose
So, what is their use in the program?
These 32-bit registers each have "specialties" and have their own special features.
-
EAX is the "accumulator", which is the default register for many add and multiply instructions.
-
EBX is the "base address" (base) register, which stores the base address during memory addressing.
-
ECX is a counter (counter), which is the default counter for repeat (REP) prefix instructions and LOOP instructions.
-
EDX is always used to hold the remainder of integer division.
-
ESI/EDI are called "source/target index registers", because in many string manipulation instructions, DS:ESI points to the source string, and ES:EDI points to the target string.
-
EBP is the "base pointer", which is most often used as a "frame pointer" for high-level language function calls. When cracking software, you can often see a standard function start assembly code:
push ebp ;保存当前ebp mov ebp,esp ;EBP设为当前堆栈指针 sub esp, xxx ;预留xxx字节给函数临时变量. ... 这样一来,EBP 构成了该函数的一个框架, 在EBP上方分别是原来的EBP, 返回地址和参数. EBP下方则是临时变量. 函数返回时作 mov esp,ebp/pop ebp/ret 即可.
-
ESP is specially used as a stack pointer and is aptly called the top pointer of the stack. The top of the stack is an area with small addresses. The more data is pushed into the stack, the smaller the ESP becomes. On 32-bit operating platforms, ESP will be reduced by 4 bytes each time.
So much for the concept of registers. In practice, the content is stored in a register and its address is used. What is really closely related to the formation of the function stack frame is: the two register addresses of EBP and ESP.
What is a "stack"?
Before starting to explain, you need to pay attention to another keyword: what is a "stack"? The stack is a type of data structure. This article will not explain too much about its implementation method. You only need to understand one of its characteristics: after the data is placed in the stack in sequence, the order in which the elements are taken out is the first to enter and the last to exit; for example Put a bunch of books in a wooden barrel, and when you need to take out the bottom book, you need to take out the upper part first to take out the bottom part. The stack area mentioned in this article mainly runs on the system memory .
The concept of function stack frame
In the register, the two registers EBP and ESP store addresses, and these two addresses are used to maintain the function stack frame.
Every time a function is called, a space needs to be created in the stack area; whichever function is called, the two addresses of EBP and ESP will maintain the memory space of this function, which is the stack frame of the function; for example, the main function is running Among them, the two pointer addresses of esp and ebp will point to its top and bottom of the stack at the same time.
That said, you might not understand. Then draw a picture!
The process of function stacking
In order to facilitate the demonstration and understanding, I will use the VS2013 version to demonstrate the process of function stacking, take you step by step to read the assembly instructions of the program and explain what kind of operation each step will make, and finally I will give you the whole The command makes a summary. Because different compilers may have different methods for program assembly and packaging, and higher-level compilers will package programs more carefully, which is not conducive to observation. At the same time, the address of the following assembly instructions will change with each program compilation (because the content is randomly allocated), if you are also debugging locally, please keep it in the same compilation situation. But in principle they are all the same.
It should be noted that in versions prior to VS2013, when you view the call stack when running program debugging, you will find that the main function is also called by other functions .
They are __tmainCRTStartup and mainCRTStartup functions respectively, where mainCRTStartup is pressed at the bottom.
The calling logic is mainCRTStartup -> __tmainCRTStartup --> main function
__tmainCRTStartup function stack
mainCRTStartup function push stack
From this, we can understand that the memory stack at this time is expressed as
By observing in the VS2013 compilation environment, it can be found that during the running process of the function, the esp stack top pointer and the ebp stack bottom pointer will form a memory space to form the stack frame of the function. So, what exactly does the program do? We can study its stack push process by looking at the program's disassembly code.
The following is part of the disassembly code of the main function, now let's see how its specific principle works.
Sample code and main function assembly instructions (part)
Program code, this article will use this code as an example to introduce the process of generating and destroying function stack frames, local variables and function calls.
The following assembly instructions are part of what will be discussed below.
In this article, I will explain the assembly instructions in this article by combining the assembly instruction documents of the C language X86 code generation details . The following are the assembly statements that will be commonly used.
When the program enters the main function, we just mentioned that the main function is also called by other functions, so has the original function that called the main function created its function stack frame? The answer is yes. At this time, the original function __tmainCRTStartup is maintained by the two stack top/bottom pointers of esp and ebp.
The stack area at the beginning of the initial start should be as shown in the figure
Assembly instruction: build function stack frame preparation (1)
Next, let's look at the first assembly instruction that the main function comes in
The first sentence that comes in is push ebp. In the assembly instruction, it means to put the value of ebp on the top of the stack.
Then can we assume: because esp maintains the top of the stack of the program, at this time esp has run to the top of the stack, and the address of esp will point to the value of ebp? as the picture shows
How to justify this hypothesis?
When you open the monitor to monitor esp, you can find that its value will change.
Currently the initial value of the esp stack top pointer
After the push ebp is completed, does the esp address go from high to low, so the address should decrease?
This is evidenced when the monitor enters the step-by-step process: a8 to a4 are reduced by 4 bytes
So will the value of esp be the value of ebp? Open the memory block, search for the address of the new esp will be the value of ebp, the answer is clear at a glance! ~
What was the value of ebp just now? 008ffbf4, now the value of the address of the search esp is 008ffbf4, the assumption is true.
esp maintains the top of the stack of the program. At this time, esp has reached the top of the stack, and the address of the new esp will point to the value of ebp
And what kind of ebp is the pressed ebp, we will explain it below.
Assembly instruction: build function stack frame preparation (2)
Now let's look at the second assembly instruction: mov gives the value of esp to ebp.
Is this really the case? We run the next step of debugging, and the monitor feedback is as follows
At this point its stack diagram should be
Assembly instructions: build the scope of the function stack frame
Let's look at the third sentence to assemble the instruction: the address of sub esp, minus 0E4h. (sub means decrease in English, add means add in the same way)
Generally speaking, the value subtracted from ebp is 0E4h, and 0E4h here is actually an octal number. When you want to check what number 0E4h is, you can put it into the monitoring area and display its hexadecimal value, and then check the decimal number
Come here, isn't it equivalent to subtracting the value of 0E4h from esp? Will the esp at this time have changed? Monitor to view results process by process
At this time, the value of esp has changed to 0x008ffac0, which means that the address value of esp becomes smaller and moves up and no longer points to the original place, but points to a certain area above the original address.
At this time, have you found that the new esp stack top and ebp stack top pointers have formed a new dimension space after entering the main function, and esp and ebp no longer maintain the original function space? That's right, this new area is the function stack frame area pre-opened for the main function. And sub is how many bytes of space are proposed for the main function.
The schematic diagram of the stack area can be understood as the following picture
Assembly instruction: put into three non-volatile registers
The ebx, esi, and edi here are the base, source index, and target index registers in the registers we mentioned earlier, and the three of them are collectively referred to as non-volatile registers here. This is a calling convention in the C language. The reason for pushing the three registers onto the stack here is to achieve cross-platform use. The purpose of these 3 registers under the calling convention under the X86 platform is that when calling a function, it is required to push these 3 registers to save the data before the call, and it should be stored for a long time during the call.
They are being pushed into the stack here. Don't forget that while pushing into the stack, the esp stack top pointer is constantly changing.
The details of the stacking process can be as follows:
Observe the values of esp and ebx in the monitor
How will esp change when ebx starts to be pushed onto the stack? The answer is yes, the value of esp will decrease and move up
When opening the memory, you will find that the address corresponding to esp is the value of ebx 0x007e5000
Similarly, when you continue to press the esi in, the change of esp is as follows
When pressing edi, the change of esp is as follows
To sum up, the value of the original esp stack top pointer has changed from the initial 008ffac0 to the current 008ffab4, the address is constantly decreasing, and the stack top is constantly moving up.
The schematic diagram of the current stack area can be understood as
Assembly instruction: load stack frame effective space
At this point, for the convenience of intuitive experience and understanding, we will display the assembly symbol name.
When it comes to the lea statement in the seventh sentence, its full name should be load effective address (load effective address); as the name implies, from here on, the program will officially load the effective stack frame area of the current function. Let's see how it goes.
lea edi, [ebp-0E4h], does the 0E4h here look familiar? That's right, it is the size pre-applied in the function stack frame of the pre-applied main function just now. The meaning here is to store the space of ebp-0E4h size in edi, and isn't this edi the stack bottom pointer? From the stack diagram, we can observe that the space actually expressed by ebp - 0E4h is the stack space of the current main function. At this point, they have been stored in the edi register.
How to argue? Turn to the address of esp when the three non-volatile registers just before are not pushed onto the stack
Now, we open the monitor to check the address of ebp-0E4h and edi, the answer is obvious! The address of ~ebp-0E4h is the location of the third stack—edi pointed to by the current esp, and it is also the address of the esp when the three non-volatile registers are not pushed into the stack.
mov ecx, 39h and mov eax, 0CCCCCCCCh mean that the corresponding 39h and 0CCCCCCCCh are respectively placed in the ecx and eax registers.
Look at the next sentence: rep stos dword ptr es:[edi], here is very interesting, here will eventually form the effective space of the stack frame of the function. Let's take a look at the expression of the instruction statement: from the marked place in edi, copy the content of eax repeatedly ecx times until the bottom pointer ebp of the stack. It should be noted that dword expresses the meaning of double word double-byte, a word is 2 bytes, and double word is double-byte equivalent to 4 bytes.
What is their specific process? Starting from the ebp-0E4h marked by edi, copy the bytes to the high address part, and copy 4 bytes each time. The copied content is the content of eax (0CCCCCCCCh), the number of copies is 39h times, and it stops at the bottom pointer ebp of the stack.
Combined with the current theory, the program will copy bytes from 008ffac0 to the high address until the bottom pointer of the ebp stack; when you open the memory map to check the memory situation at this time, you can prove this point~
Copy to high address from 008ffac0
End at the bottom pointer of 008ffba4 stack
You may have doubts, what does this cccccccc mean? They may be slightly different in each compiler, and when we usually write a program, when the initial value of the variable is not defined, the printed output is "hot hot hot" garbled characters, not the computer's self-expression of its own temperature: ), in fact, this is the 0CCCCCCCCh character placed in the memory.
In summary, the schematic diagram of the stack area can be as follows
The program runs until now, the program has gone through five steps, and the function stack frame opened for the main function is officially completed. This area formed by esp and ebp is the scope of a function, and a space formed by 0CCCCCCCCh is used to store local variable space. Next, the program formally executes valid code.
Assembly instructions: generate function local variables
After many operations above, the effective stack frame area of a function has been formed, and the program will actually execute its effective code at this time. According to the code requirements written before, local variables will be created when the program comes in. How are local variables created in the stack frame area?
First, let's look at the assembly instructions: is this syntax familiar? The meaning of the statement is: put 0Ah, 14h, 0 into the positions of ebp - 8, ebp - 14h, and ebp - 20h in sequence.
ebp - 8, ebp - 14h, and ebp - 20h are a series of addresses that are reduced to the lower address based on the bottom pointer of the stack. Here, a space is allocated to 0Ah, 14h, and 0, and this 0Ah, 14h, and 0 are The computer's hexadecimal representation of 10, 20, 0. Place
Now let's prove what we just said. First, continue to observe the value of the pointer at the bottom of the ebp stack to see if it stores variables at low addresses;
Step by step into the statement, the answer is obvious, from the stack bottom pointer 008ffba4 to the low address - 8, the stored value is 0ah. At this point the local variable a = 10 has been created
Let's continue to look at the next step and create a local variable b = 20, and the following c is the same.
According to the observation of the creation process of local variables in the stack frame, we can find that local variables are stored from high addresses to low addresses after forming a valid stack frame space. If the initial value of the variable is not set, the program will delineate an area as the address of the variable.
The schematic diagram of the stack area at this time can be as follows
Summary of this article
In this article, we briefly introduce the basic process of building a stack frame of a function based on the main function; we learned that:
- The stack frame of a function is actually a piece of memory space jointly maintained by the two stack top and bottom pointers of esp and ebp;
- When a function starts to generate a stack frame, it will first push the ebp address of the stack bottom pointer of the previous function.
- In the process of generating the stack frame, the ever-expanding stack frame, pushing new content or registers will make the esp stack top pointer offset upward;
- When performing operations related to determining the orientation, the position of the bottom pointer ebp of the stack is used as the offset to start offsetting to the lower address.
- After pushing the 3 non-volatile registers, the program will fill an area based on the ebp stack top pointer to the low address, and this area is the scope of a function. In this scope, the program will generate corresponding variables according to the upward direction (low address) of the ebp pointer.
In the next article, we will introduce the calling and returning process of functions, and make a summary of the questions we raised at the beginning.
If you have any questions, please ask in the comment area.
Conclusion
Function stack frame destruction and process (1) have been introduced, and there is a second chapter on related topics, which has more content and is full of dry goods. If you think this article is useful to you, don't forget to like and watch + follow!
Creation is not easy, your attention and appreciation is the biggest encouragement for the author! The author will continue to share knowledge about C/C++ learning and practical use of Python. Your support will make the follow-up author work harder to publish more high-quality articles, and learn to upgrade and fight monsters with you!
View the next article
The next article is expected to be published within this week. This series of articles has been released first on the WeChat public account: 01 Programming Cabin . Welcome everyone to pay attention to the first time to learn new knowledge; follow the Cabin to learn programming and not get lost