Graphical C/C++ language bottom layer: the creation and destruction of function stack frames in the function call process (on)

**


​ —— POWERED BY CAIXYPROMISE


Creation and destruction of function stack frames

Through the previous study, we have learned the grammar and usage of the most basic C language program, but do you have any questions?

for example:

  • How is the scope of a function formed?

  • How are local variables created?

  • Why are the values ​​of uninitialized local variables random or garbled?

  • How is the function passed parameters? What is the order of passing parameters?

  • What is the relationship between formal parameters and actual parameters?

  • How is the function call implemented?

  • How does the function return after the end of the call?

  • Why is there a maximum depth of function recursion? What does the stack overflow error raised by reaching the maximum depth mean?

When you understand the creation and destruction of function stack frames, these doubts will be solved one by one! With these questions in mind, let's enter the function stack frame!

Due to the length of the article, this series of articles is divided into two parts. This article is the first one, and will mainly introduce:

If you have any questions, please reply in the comment area. Click to view the next article immediately to learn the new knowledge of the next article as soon as possible.

  • What are registers?
  • What is a stack?
  • The formation process of the function stack frame
  • The process of forming function variables

Understanding the function stack frame requires disassembly operations, and the author will introduce it according to the relevant assembly instructions.

The compasses are turning forward, let’s get to the point!


What are registers?

First thing you need to know: what is a register?

In computer hardware, what is the hardware with storage function? They are hard disk --> memory --> cache (cache) --> registers, and the access speed and storage speed of the four of them are increasing from left to right; at the same time, their sizes are also decreasing from left to right , to the last register, its storage space may only have the size of a 4byte storage unit, and at the same time its access speed is the fastest, because the register is generally integrated on the CPU, and it is an independent storage space different from the memory. As the saying goes, if the network speed is fast, you are sitting on the server to play games, and the faster the reading speed is, you are sitting on the CPU to read. This is the reason why registers are read faster.

image-20220330211304684

Register classification

There are many types of computer registers

  • General registers: EAX, EBX, ECX, EDX

    ax: accumulation register, bx: base register, cx: count register, ed: data register

  • Index register: ESI, EDI

    si: source index register, di: destination index register

  • Stack, base registers: ESP, EBP

    sp: stack index register, bp: base index register; these two registers are also the two most important registers in the function stack frame

in:

  • EAX, ECX, EDX, EBX: extensions of ax, bx, cx, dx, each 32 bits
  • ESI, EDI, ESP, EBP: extensions of si, di, sp, bp, 32 bits each
  • EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP, etc. are the names of general-purpose registers on the CPU in X86 assembly language, which are 32-bit registers.

Register purpose

So, what is their use in the program?

These 32-bit registers each have "specialties" and have their own special features.

  • EAX is the "accumulator", which is the default register for many add and multiply instructions.

  • EBX is the "base address" (base) register, which stores the base address during memory addressing.

  • ECX is a counter (counter), which is the default counter for repeat (REP) prefix instructions and LOOP instructions.

  • EDX is always used to hold the remainder of integer division.

  • ESI/EDI are called "source/target index registers", because in many string manipulation instructions, DS:ESI points to the source string, and ES:EDI points to the target string.

  • EBP is the "base pointer", which is most often used as a "frame pointer" for high-level language function calls. When cracking software, you can often see a standard function start assembly code:

    push ebp ;保存当前ebp
    mov ebp,esp ;EBP设为当前堆栈指针
    sub esp, xxx ;预留xxx字节给函数临时变量.
    ...
    这样一来,EBP 构成了该函数的一个框架, 在EBP上方分别是原来的EBP, 返回地址和参数. EBP下方则是临时变量. 函数返回时作 mov esp,ebp/pop ebp/ret 即可.
    
  • ESP is specially used as a stack pointer and is aptly called the top pointer of the stack. The top of the stack is an area with small addresses. The more data is pushed into the stack, the smaller the ESP becomes. On 32-bit operating platforms, ESP will be reduced by 4 bytes each time.

So much for the concept of registers. In practice, the content is stored in a register and its address is used. What is really closely related to the formation of the function stack frame is: the two register addresses of EBP and ESP.

What is a "stack"?

Before starting to explain, you need to pay attention to another keyword: what is a "stack"? The stack is a type of data structure. This article will not explain too much about its implementation method. You only need to understand one of its characteristics: after the data is placed in the stack in sequence, the order in which the elements are taken out is the first to enter and the last to exit; for example Put a bunch of books in a wooden barrel, and when you need to take out the bottom book, you need to take out the upper part first to take out the bottom part. The stack area mentioned in this article mainly runs on the system memory .

The concept of function stack frame

In the register, the two registers EBP and ESP store addresses, and these two addresses are used to maintain the function stack frame.

Every time a function is called, a space needs to be created in the stack area; whichever function is called, the two addresses of EBP and ESP will maintain the memory space of this function, which is the stack frame of the function; for example, the main function is running Among them, the two pointer addresses of esp and ebp will point to its top and bottom of the stack at the same time.

That said, you might not understand. Then draw a picture!

image-20220110184503700

The process of function stacking

In order to facilitate the demonstration and understanding, I will use the VS2013 version to demonstrate the process of function stacking, take you step by step to read the assembly instructions of the program and explain what kind of operation each step will make, and finally I will give you the whole The command makes a summary. Because different compilers may have different methods for program assembly and packaging, and higher-level compilers will package programs more carefully, which is not conducive to observation. At the same time, the address of the following assembly instructions will change with each program compilation (because the content is randomly allocated), if you are also debugging locally, please keep it in the same compilation situation. But in principle they are all the same.

It should be noted that in versions prior to VS2013, when you view the call stack when running program debugging, you will find that the main function is also called by other functions .

They are __tmainCRTStartup and mainCRTStartup functions respectively, where mainCRTStartup is pressed at the bottom.

The calling logic is mainCRTStartup -> __tmainCRTStartup --> main function

__tmainCRTStartup function stack

image-20220110211617884

image-20220110211729352

mainCRTStartup function push stack

image-20220110212352354

From this, we can understand that the memory stack at this time is expressed as

image-20220110220522449

By observing in the VS2013 compilation environment, it can be found that during the running process of the function, the esp stack top pointer and the ebp stack bottom pointer will form a memory space to form the stack frame of the function. So, what exactly does the program do? We can study its stack push process by looking at the program's disassembly code.

The following is part of the disassembly code of the main function, now let's see how its specific principle works.

Sample code and main function assembly instructions (part)

Program code, this article will use this code as an example to introduce the process of generating and destroying function stack frames, local variables and function calls.

image-20220111163038299

The following assembly instructions are part of what will be discussed below.

image-20220110221517882

In this article, I will explain the assembly instructions in this article by combining the assembly instruction documents of the C language X86 code generation details . The following are the assembly statements that will be commonly used.

image-20220113123135672

When the program enters the main function, we just mentioned that the main function is also called by other functions, so has the original function that called the main function created its function stack frame? The answer is yes. At this time, the original function __tmainCRTStartup is maintained by the two stack top/bottom pointers of esp and ebp.

The stack area at the beginning of the initial start should be as shown in the figure

image-20220110223836717

Assembly instruction: build function stack frame preparation (1)

Next, let's look at the first assembly instruction that the main function comes in

image-20220110223528221

The first sentence that comes in is push ebp. In the assembly instruction, it means to put the value of ebp on the top of the stack.

image-20220110224216553

Then can we assume: because esp maintains the top of the stack of the program, at this time esp has run to the top of the stack, and the address of esp will point to the value of ebp? as the picture shows

image-20220110224316200

How to justify this hypothesis?

When you open the monitor to monitor esp, you can find that its value will change.

Currently the initial value of the esp stack top pointer

image-20220110224605878

After the push ebp is completed, does the esp address go from high to low, so the address should decrease?

This is evidenced when the monitor enters the step-by-step process: a8 to a4 are reduced by 4 bytes

image-20220111000052774

So will the value of esp be the value of ebp? Open the memory block, search for the address of the new esp will be the value of ebp, the answer is clear at a glance! ~

What was the value of ebp just now? 008ffbf4, now the value of the address of the search esp is 008ffbf4, the assumption is true.

esp maintains the top of the stack of the program. At this time, esp has reached the top of the stack, and the address of the new esp will point to the value of ebp

image-20220110225136909

And what kind of ebp is the pressed ebp, we will explain it below.

Assembly instruction: build function stack frame preparation (2)

Now let's look at the second assembly instruction: mov gives the value of esp to ebp.

image-20220110225937687

Is this really the case? We run the next step of debugging, and the monitor feedback is as follows

image-20220111000207900

At this point its stack diagram should be

image-20220111000344361

Assembly instructions: build the scope of the function stack frame

Let's look at the third sentence to assemble the instruction: the address of sub esp, minus 0E4h. (sub means decrease in English, add means add in the same way)

image-20220111000446712

Generally speaking, the value subtracted from ebp is 0E4h, and 0E4h here is actually an octal number. When you want to check what number 0E4h is, you can put it into the monitoring area and display its hexadecimal value, and then check the decimal number

image-20220111001133085

image-20220111001200430

Come here, isn't it equivalent to subtracting the value of 0E4h from esp? Will the esp at this time have changed? Monitor to view results process by process

image-20220111001414197

At this time, the value of esp has changed to 0x008ffac0, which means that the address value of esp becomes smaller and moves up and no longer points to the original place, but points to a certain area above the original address.

image-20220111002520582

At this time, have you found that the new esp stack top and ebp stack top pointers have formed a new dimension space after entering the main function, and esp and ebp no longer maintain the original function space? That's right, this new area is the function stack frame area pre-opened for the main function. And sub is how many bytes of space are proposed for the main function.

The schematic diagram of the stack area can be understood as the following picture

image-20220111002230000

Assembly instruction: put into three non-volatile registers

The ebx, esi, and edi here are the base, source index, and target index registers in the registers we mentioned earlier, and the three of them are collectively referred to as non-volatile registers here. This is a calling convention in the C language. The reason for pushing the three registers onto the stack here is to achieve cross-platform use. The purpose of these 3 registers under the calling convention under the X86 platform is that when calling a function, it is required to push these 3 registers to save the data before the call, and it should be stored for a long time during the call.

They are being pushed into the stack here. Don't forget that while pushing into the stack, the esp stack top pointer is constantly changing.

image-20220111003749180

The details of the stacking process can be as follows:

Observe the values ​​of esp and ebx in the monitor

image-20220111005406787

How will esp change when ebx starts to be pushed onto the stack? The answer is yes, the value of esp will decrease and move up

image-20220111005510565

When opening the memory, you will find that the address corresponding to esp is the value of ebx 0x007e5000

image-20220111005640832

Similarly, when you continue to press the esi in, the change of esp is as follows

image-20220111005902323

image-20220111010033722

When pressing edi, the change of esp is as follows

image-20220111010112098

image-20220111010131509

To sum up, the value of the original esp stack top pointer has changed from the initial 008ffac0 to the current 008ffab4, the address is constantly decreasing, and the stack top is constantly moving up.

The schematic diagram of the current stack area can be understood as

image-20220111004521750

Assembly instruction: load stack frame effective space

At this point, for the convenience of intuitive experience and understanding, we will display the assembly symbol name.

When it comes to the lea statement in the seventh sentence, its full name should be load effective address (load effective address); as the name implies, from here on, the program will officially load the effective stack frame area of ​​the current function. Let's see how it goes.

image-20220111024656812

lea edi, [ebp-0E4h], does the 0E4h here look familiar? That's right, it is the size pre-applied in the function stack frame of the pre-applied main function just now. The meaning here is to store the space of ebp-0E4h size in edi, and isn't this edi the stack bottom pointer? From the stack diagram, we can observe that the space actually expressed by ebp - 0E4h is the stack space of the current main function. At this point, they have been stored in the edi register.

image-20220111015833316

How to argue? Turn to the address of esp when the three non-volatile registers just before are not pushed onto the stack

image-20220111001414197

Now, we open the monitor to check the address of ebp-0E4h and edi, the answer is obvious! The address of ~ebp-0E4h is the location of the third stack—edi pointed to by the current esp, and it is also the address of the esp when the three non-volatile registers are not pushed into the stack.

image-20220115204906978

mov ecx, 39h and mov eax, 0CCCCCCCCh mean that the corresponding 39h and 0CCCCCCCCh are respectively placed in the ecx and eax registers.

Look at the next sentence: rep stos dword ptr es:[edi], here is very interesting, here will eventually form the effective space of the stack frame of the function. Let's take a look at the expression of the instruction statement: from the marked place in edi, copy the content of eax repeatedly ecx times until the bottom pointer ebp of the stack. It should be noted that dword expresses the meaning of double word double-byte, a word is 2 bytes, and double word is double-byte equivalent to 4 bytes.

What is their specific process? Starting from the ebp-0E4h marked by edi, copy the bytes to the high address part, and copy 4 bytes each time. The copied content is the content of eax (0CCCCCCCCh), the number of copies is 39h times, and it stops at the bottom pointer ebp of the stack.

Combined with the current theory, the program will copy bytes from 008ffac0 to the high address until the bottom pointer of the ebp stack; when you open the memory map to check the memory situation at this time, you can prove this point~

Copy to high address from 008ffac0

image-20220111113038428

End at the bottom pointer of 008ffba4 stack

image-20220111113353635

You may have doubts, what does this cccccccc mean? They may be slightly different in each compiler, and when we usually write a program, when the initial value of the variable is not defined, the printed output is "hot hot hot" garbled characters, not the computer's self-expression of its own temperature: ), in fact, this is the 0CCCCCCCCh character placed in the memory.

In summary, the schematic diagram of the stack area can be as follows

image-20220111031852347

The program runs until now, the program has gone through five steps, and the function stack frame opened for the main function is officially completed. This area formed by esp and ebp is the scope of a function, and a space formed by 0CCCCCCCCh is used to store local variable space. Next, the program formally executes valid code.

Assembly instructions: generate function local variables

After many operations above, the effective stack frame area of ​​a function has been formed, and the program will actually execute its effective code at this time. According to the code requirements written before, local variables will be created when the program comes in. How are local variables created in the stack frame area?

image-20220111161949940

First, let's look at the assembly instructions: is this syntax familiar? The meaning of the statement is: put 0Ah, 14h, 0 into the positions of ebp - 8, ebp - 14h, and ebp - 20h in sequence.

image-20220111135625127

ebp - 8, ebp - 14h, and ebp - 20h are a series of addresses that are reduced to the lower address based on the bottom pointer of the stack. Here, a space is allocated to 0Ah, 14h, and 0, and this 0Ah, 14h, and 0 are The computer's hexadecimal representation of 10, 20, 0. Place

Now let's prove what we just said. First, continue to observe the value of the pointer at the bottom of the ebp stack to see if it stores variables at low addresses;

image-20220111160123356

Step by step into the statement, the answer is obvious, from the stack bottom pointer 008ffba4 to the low address - 8, the stored value is 0ah. At this point the local variable a = 10 has been created

image-20220111160512467

Let's continue to look at the next step and create a local variable b = 20, and the following c is the same.

image-20220111161648219

According to the observation of the creation process of local variables in the stack frame, we can find that local variables are stored from high addresses to low addresses after forming a valid stack frame space. If the initial value of the variable is not set, the program will delineate an area as the address of the variable.

The schematic diagram of the stack area at this time can be as follows

image-20220111165749903

Summary of this article

In this article, we briefly introduce the basic process of building a stack frame of a function based on the main function; we learned that:

  • The stack frame of a function is actually a piece of memory space jointly maintained by the two stack top and bottom pointers of esp and ebp;
  • When a function starts to generate a stack frame, it will first push the ebp address of the stack bottom pointer of the previous function.
  • In the process of generating the stack frame, the ever-expanding stack frame, pushing new content or registers will make the esp stack top pointer offset upward;
  • When performing operations related to determining the orientation, the position of the bottom pointer ebp of the stack is used as the offset to start offsetting to the lower address.
  • After pushing the 3 non-volatile registers, the program will fill an area based on the ebp stack top pointer to the low address, and this area is the scope of a function. In this scope, the program will generate corresponding variables according to the upward direction (low address) of the ebp pointer.

In the next article, we will introduce the calling and returning process of functions, and make a summary of the questions we raised at the beginning.

If you have any questions, please ask in the comment area.

Conclusion

Function stack frame destruction and process (1) have been introduced, and there is a second chapter on related topics, which has more content and is full of dry goods. If you think this article is useful to you, don't forget to like and watch + follow!

Creation is not easy, your attention and appreciation is the biggest encouragement for the author! The author will continue to share knowledge about C/C++ learning and practical use of Python. Your support will make the follow-up author work harder to publish more high-quality articles, and learn to upgrade and fight monsters with you!

View the next article

The next article is expected to be published within this week. This series of articles has been released first on the WeChat public account: 01 Programming Cabin . Welcome everyone to pay attention to the first time to learn new knowledge; follow the Cabin to learn programming and not get lost
insert image description here

Guess you like

Origin blog.csdn.net/weixin_43654363/article/details/123858937