Scenario Analysis of Go Language Scheduler Source Code Seven: Function Calling Process

The following content is reproduced from  https://mp.weixin.qq.com/s/3RUjui-q6bgRnUW7TgOjmA

Awa love to write original programs Zhang  source Travels  2019-04-22

In the previous sections, we introduced the basic knowledge of CPU registers, memory, assembly instructions, and stacks. In order to achieve the purpose of understanding and deepening understanding, in this section we will comprehensively apply what we have learned and look at the execution and calling process of functions.

The issues we need to focus on in this section are:

  • How does the CPU jump from the caller to the called function execution?

  • How are the parameters passed from the caller to the called function?

  • How is the memory occupied by function local variables allocated on the stack?

  • How is the return value returned from the called function to the caller?

  • What cleanup work needs to be done after the function is executed?

After solving these problems, we have a general understanding of the principle of computer execution, which is very important for our understanding of goroutine scheduling.

Compared with go, the C language is closer to the hardware, and the compiled assembly code is simpler and more intuitive, making it easier for us to grasp the basic principles of function calls, so let's first look at how the C language function calls are at the assembly instruction level. Realize, and then analyze the function call process of go language on this basis.

C language function call process

Let's start the analysis with a simple example program.

#include <stdio.h>

// sum the parameters a and b
int sum(int a, int b)
{
        int s = a + b;

        return s;
}

// main function: program entry
int main(int argc, char *argv[])
{
        int n = sum(1, 2); // call the sum function to sum

        printf("n: %d\n", n); //output the value of n on the screen

        return 0;
}

Compile this program with gcc to get the executable program call, and then use gdb to debug. In gdb, we use disass main to disassemble the main function to find that the address of the first instruction of main is 0x0000000000400540, and then use b *0x0000000000400540 to place a breakpoint at this address and run the program:

bobo@ubuntu:~/study/c$ gdb ./call
(gdb) disass main
Dump of assembler code for function main:
  0x0000000000400540 <+0>:push   %rbp
  0x0000000000400541 <+1>:mov   %rsp,%rbp
  0x0000000000400544 <+4>:sub   $0x20,%rsp
  0x0000000000400548 <+8>:mov   %edi,-0x14(%rbp)
  0x000000000040054b <+11>:mov   %rsi,-0x20(%rbp)
  0x000000000040054f <+15>: mov $ 0x2,% esi
  0x0000000000400554 <+20>:mov   $0x1,%edi
  0x0000000000400559 <+25>:callq 0x400526 <sum>
  0x000000000040055e <+30>:mov   %eax,-0x4(%rbp)
  0x0000000000400561 <+33>:mov   -0x4(%rbp),%eax
  0x0000000000400564 <+36>: mov% eax,% esi
  0x0000000000400566 <+38>: mov $ 0x400604,% edi
  0x000000000040056b <+43>:mov   $0x0,%eax
  0x0000000000400570 <+48>:callq 0x400400 <printf@plt>
  0x0000000000400575 <+53>:mov   $0x0,%eax
  0x000000000040057a <+58>:leaveq
  0x000000000040057b <+59>:retq  
End of assembler dump.
(gdb) b *0x0000000000400540
Breakpoint 1 at 0x400540
(gdb) r
Starting program: /home/bobo/study/c/call
Breakpoint 1, 0x0000000000400540 in main ()

The program stopped at our breakpoint, which is the position of the first instruction of the main function. Disassemble the main function to be executed again, let's look at the first three instructions:

(gdb) disass
Dump of assembler code for function main:
=> 0x0000000000400540 <+0>:push   %rbp
  0x0000000000400541 <+1>:mov   %rsp,%rbp
  0x0000000000400544 <+4>:sub   $0x20,%rsp
  ......

These 3 instructions are generally called function prologues. Basically, each function starts with the function prologue. Its main function is to save the caller's rbp register and allocate stack space for the current function. We will introduce these 3 instructions in detail later , Let’s first explain the format of the disassembled code output by gdb. The code disassembled by gdb is mainly divided into three parts:

  • Instruction address

  • The offset of the instruction relative to the start address of the current function in bytes

  • instruction

For example, the first line of code 0x0000000000400540 <+0>: push %rbp, which means that the address of the first instruction push %rbp of the main function in memory is 0x0000000000400540, and the offset is 0 (because it is the first instruction of the main function) . The components of this line of code are shown in the following figure:

 

 

One thing to note here is that the instruction address and offset in the result of the gdb disassembly output are just added by gdb to make it easier for us to read the code. The code stored in the memory and executed by the CPU is only the instruction part of the above figure.

Note that there is a => symbol on the leftmost side of the first line of code in the above disassembly result, which indicates that this instruction is the next instruction to be executed by the CPU, that is, the current value of the rip register is 0x0000000000400540, and the current state is The previous instruction has been executed, but this instruction has not yet started. Use ir rbp rsp rip to check the values ​​of the three registers: rbp, rsp, and rip:

(gdb) i r rbp rsp rip
rbp           0x4005800x400580 <__libc_csu_init>
rsp           0x7fffffffe5180x7fffffffe518
rip           0x4005400x400540 <main>

According to the values ​​of these registers, the status of the function call stack, rbp, rsp, and rip at the current moment and the relationship between them are shown in the following figure:

 

image

 

 

Because rbp, rsp and rip store addresses, each of these registers is equivalent to a pointer. Looking at the figure above, rip points to the first instruction of the main function, and rsp points to the stack of the current function call stack. Top, and the rbp register does not point to the stack and instructions we are concerned about, so its specific point is not drawn, but its value is displayed.

In order to understand the execution flow of the program more clearly, we now start to simulate the CPU from the first instruction of the main function until the complete main function is executed.

Now start to execute the first instruction,

0x0000000000400540 <+0>:push %rbp # Save the value of the caller's rbp register

This instruction temporarily saves the value of the stack base address register rbp in the stack frame of the main function, because the main function needs to use this register to store its own stack base address, and the caller also puts its stack base before calling the main function The address is stored in this register, so the main function needs to save the value in this register first, and then restore the register to its original state when main returns after execution. If it does not restore its original state, the caller uses the rbp register after the main function returns. When the caller's code is executed, rbp should point to the caller's stack but now it points to the main function's stack.

Before this instruction, the code is still using the caller's stack frame. After executing this instruction, it starts to use the stack frame of the main function. Currently, the stack frame of the main function only saves the caller's rbp value. Before continuing to execute the next instruction, the status of the stack and registers is as shown in the figure below. The instruction marked in red in the figure indicates the instruction that has just been executed. You can see that the values ​​of the rsp and rip registers have changed, and they all point to new locations. rsp points to the starting position of the stack frame of the main function, and rip points to the second instruction of the main function.

 

image

 

 

In the section of assembly instructions, we introduced that executing the push instruction will modify the value of the rsp register, but it will not modify the rip register. Why does rip change here? In fact, this is done automatically by the CPU. The CPU itself knows that the length of each instruction to be executed is several bytes. For example, the push %rbp instruction here is only 1 byte long, so it starts to execute this instruction. The value of rip will be +1, because the value of rip before the execution of this instruction is 0x400540, and after +1 it becomes 0x400541, which means it points to the second instruction of the main function.

Then execute the second instruction,

0x0000000000400541 <+1>:mov %rsp,%rbp # Adjust the rbp register to point to the starting position of the main function stack frame

This instruction copies the value of rsp to the rbp register and makes it point to the starting position of the stack frame of the main function. After executing this instruction, the rsp and rbp registers have the same value, and they all point to the beginning of the stack frame of the main function. Start position, as shown in the figure below:

 

image

 

 

Then execute the third instruction,

0x0000000000400544 <+4>:sub $0x20,%rsp # Adjust the value of the rsp register to reserve stack space for local and temporary variables

This instruction subtracts 32 (0x20 in hexadecimal) from the value of the rsp register, making it point to a lower position in the stack space. This step seems to simply modify the value of the rsp register, but its essence is It is to reserve 32 (0x20) bytes of stack space for the local variables and temporary variables of the main function. Why is it reserved instead of allocated, because the stack allocation is automatically completed by the operating system, and the operating system will do it when the program starts A large block of memory is allocated to us as a function call stack. How much stack memory the program uses is determined by the rsp stack top register.

After the instruction is executed, the stack memory from the position pointed to by rsp to the part pointed by rbp constitutes a complete stack frame of the main function, the size of which is 40 bytes (8 bytes are used to save the caller's rbp, and 32 Bytes are used for local and temporary variables of the main function), as shown below:

 

image

 

 

We executed the next 4 instructions together,

0x0000000000400548 <+8>:mov   %edi,-0x14(%rbp)
0x000000000040054b <+11>:mov   %rsi,-0x20(%rbp)
0x000000000040054f <+15>:mov $0x2,%esi #The second parameter of the sum function is placed in the esi register
0x0000000000400554 <+20>:mov $0x1,%edi #The first parameter of the sum function is placed in the edi register

The first two instructions are responsible for storing the two parameters obtained by the main function in the stack frame of the main function. It can be seen that the method of rbp plus offset is used to access the stack memory. The reason to save the two parameters of the main function here is because the caller uses the edi and rsi registers to pass the argc (integer) and argv (pointer) two parameters to the main function when calling the main function, and the main You need to use these two registers to pass parameters to the sum function. In order not to overwrite argc and argv, you need to save these two parameters in the stack first, and then put the two parameters 1 and 2 passed to the sum function into Among these two registers.

The next two instructions are preparing parameters for the sum function. We can see that the first parameter passed to sum is placed in the edi register, and the second parameter is placed in esi. You may ask, how does the called function know the parameters are placed in these two registers? In fact, this is just an agreement. Everyone agrees: when calling a function, the caller is responsible for putting the first parameter in rdi, and the second parameter in rsi, and the called function goes directly to these two registers to take the parameters. come out. There is another detail here. The two parameters passed to sum are edi and esi instead of rdi and rsi. The reason is that int is 32-bit in C language, while rdi and rsi are both 64-bit, edi and esi. Can be used as part of rdi and rsi respectively.

Back to the topic, the state diagram of the stack and registers after executing these 4 instructions (note that argc in the figure below uses the upper 4 bytes of the continuous 8-byte memory in the figure, and the lower 4 bytes are unused):

 

image

 

 

After the parameters are ready, execute the call instruction to call the sum function,

0x0000000000400559 <+25>:callq 0x400526 <sum> #Call the sum function

The call instruction is a bit special. When it is first executed, rip points to the next instruction of the call instruction, which means that the value of the rip register is the address 0x40055e, but during the execution of the call instruction, the call instruction will change the current value of rip. (0x40055e) into the stack, and then modify the value of rip to the operand after the call instruction, here is 0x400526, which is the address of the first instruction of the sum function, so that the cpu will jump to the sum function for execution.

After the call instruction is executed, the status of the stack and registers is shown in the figure below. You can see that rip has pointed to the first instruction of the sum function. The address of the instruction that needs to be executed after the sum function returns, and the address 0x40055e of the instruction that needs to be executed has also been saved to the main function. In the stack frame.

 

image

 

 

Since the call instruction that calls the sum function is executed in main, the CPU now jumps to the sum function to start execution,

0x0000000000400526 <+0>:push   %rbp          
0x0000000000400527 <+1>:mov   %rsp,%rbp 
0x000000000040052a <+4>:mov   %edi,-0x14(%rbp)  
0x000000000040052d <+7>: mov% esi, -0x18 (% rbp)  
0x0000000000400530 <+10>:mov   -0x14(%rbp),%edx
0x0000000000400533 <+13>:mov   -0x18(%rbp),%eax
0x0000000000400536 <+16>:add   %edx,%eax
0x0000000000400538 <+18>:mov   %eax,-0x4(%rbp)
0x000000000040053b <+21>:mov   -0x4(%rbp),%eax
0x000000000040053e <+24>:pop   %rbp
0x000000000040053f <+25>:retq  

The first two instructions of the sum function are exactly the same as the first two instructions of the main function.

0x0000000000400526 <+0>:push %rbp # The preamble of the sum function, save the caller's rbp
0x0000000000400527 <+1>:mov %rsp,%rbp # The preamble of the sum function, adjust the rbp register to point to the starting position of the stack frame

They are all saving the caller's rbp and then setting the new value to point to the starting position of the current function stack frame, where the sum function saves the value of the rbp register of the main function (0x7fffffffe510), and makes the rbp register point to its own stack frame The starting position (address is 0x7fffffffe4e0).

It can be seen that the function prologue of sum does not reserve stack space for local variables and temporary variables for the sum function by adjusting the value of the rsp register like the main function. Does this mean that the sum function does not use the stack to save Local variables, in fact, are not. From the analysis later, we can see that the local variable s of the sum function is still stored on the stack. Why can we use it if there is no reservation? The reason is also mentioned earlier, the memory on the stack does not need to be allocated in the application layer code, the operating system has already allocated it for us, just use it. The main function needs to adjust the value of the rsp register because it needs to use the call instruction to call the sum function, and the call instruction will automatically subtract 8 from the value of the rsp register and save the return address of the function to the stack memory location pointed to by rsp If the main function does not adjust the value of rsp, the call instruction will overwrite the value of the local variable or temporary variable when the function return address is saved; and there is no instruction in the sum function that will automatically use the rsp register to save the data to the stack, so there is no need Adjust the rsp register.

The next 4 instructions,

0x000000000040052a <+4>:mov %edi,-0x14(%rbp) # Put the first parameter a into a temporary variable
0x000000000040052d <+7>:mov %esi,-0x18(%rbp) # Put the second parameter b into a temporary variable
0x0000000000400530 <+10>:mov -0x14(%rbp),%edx # read the first one from the temporary variable to the edx register
0x0000000000400533 <+13>:mov -0x18(%rbp),%eax # read the second from the temporary variable to the eax register

Save the parameters passed by main to sum in the current stack frame by adding the offset to the rbp register, and then take them out and put them in the registers. This is a bit redundant, because we did not specify the optimization level for gcc when we compiled, gcc No optimization is done by default when compiling the program, so the code looks verbose.

The next few instructions

0x0000000000400536 <+16>:add %edx,%eax # execute a + b and save the result to eax register
0x0000000000400538 <+18>:mov %eax,-0x4(%rbp) # Assign the addition result to the variable s
0x000000000040053b <+21>:mov -0x4(%rbp),%eax # read the value of the s variable to the eax register

The first add instruction is responsible for performing the addition operation and storing the result 3 in the eax register, the second instruction is responsible for saving the value of the eax register to the memory where the s variable is located, and the third instruction reading the value of the s variable to eax Register, you can see that the local variable s is arranged by the compiler in the memory corresponding to the address rbp-0x4.

At this point, the main function of the sum function has been completed. Before continuing to execute the last two instructions, let's take a look at the status of the registers and stack:

 

image

 

 

There is 1 point in the above picture that needs to be explained:

  • The two parameters and return value of the sum function are of int type, which occupies only 4 bytes in the memory. In our schematic diagram, each stack memory unit occupies 8 bytes and is aligned according to the 8-byte address boundary. , So it is what it looks like in the schematic diagram now.

 

Let's continue to execute the pop %rbp instruction, which contains two operations:

  1. Put the value in the stack memory pointed to by the current rsp into the rbp register, so that rbp is restored to the value when the first instruction of the sum function has not been executed, that is, it points to the start address of the stack frame of the main function again.

  2. Add 8 to the value in the rsp register, so that rsp points to the stack memory containing the value 0x40055e, and the value in this stack unit is put in the call instruction when the main function calls the sum function, and the value put in is tight The address of the next instruction following the call instruction.

Let's take a look at the schematic diagram:

 

image

 

 

Continuing with the retq instruction, this instruction takes the 0x40055e in the stack unit pointed to by rsp to the rip register, and at the same time adds 8 to rsp, so that the value in the rip register becomes the next instruction of the call instruction that calls sum in the main function, so Return to the main function to continue execution. Note that the value in the eax register is 3, which is the return value after the sum function is executed. Let's take a look at the status.

 

image

 

 

Continue to execute in the main function

mov %eax,-0x4(%rbp) # Assign the return value of the sum function to the variable n

This instruction puts the value (3) in the eax register into the memory pointed to by rbp-4, where the variable n is located, so this statement actually assigns the return value of the sum function to the variable n. The status at this time is:

 

image

 

 

The next few instructions

0x0000000000400561 <+33>:mov   -0x4(%rbp),%eax
0x0000000000400564 <+36>: mov% eax,% esi
0x0000000000400566 <+38>: mov $ 0x400604,% edi
0x000000000040056b <+43>:mov   $0x0,%eax
0x0000000000400570 <+48>:callq 0x400400 <printf@plt>
0x0000000000400575 <+53>:mov   $0x0,%eax

First prepare the parameters for the printf function and then call the printf function. We will not analyze them here, because the process of calling printf and sum is similar. We let the CPU quickly execute these instructions and then pause at the penultimate one of the main function. At the leaveq instruction, the stack and register states at this time are as follows:

 

 

 

The function of the instruction mov $0x0, %eax above the leaveq instruction is to put the return value 0 of the main function in the eax register, and the function that calls the main function after main returns can get this return value. Now execute the leaveq command,

0x000000000040057a <+58>:leaveq

This instruction is equivalent to the following two instructions:

mov %rbp, %rsp
pop %rbp

The leaveq instruction first copies the value in the rbp register to rsp, so that rsp points to the stack unit pointed to by rbp, and then POPs the value in the memory unit to the rbp register, so that the values ​​of rbp and rsp are restored to just entered The state of the main function is now. Look at the picture:

 

image

 

 

At this point, the main function is only left with the retq instruction, which has been analyzed in detail when analyzing the sum function. After this instruction is executed, it will completely return to the function that called the main function to continue execution.

Function call process in go language

I spent a lot of time analyzing the function call process of the C language, including the passing of parameters, call instructions, ret instructions, and how the return value is returned from the called function to the calling function. With these foundations, let's next Looking at the function call process in go language, in fact, the principle of the two is the same, but there is a little difference in the details. Still use a simple example to analyze.

package main

//Calculate the sum of squares of a, b
func sum(a, b int) int {
        a2 := a * a
        b2 := b * b
        c := a2 + b2

        return c
}

func main() {
sum(1, 2)
}

Use go build to compile the program. Note that you need to specify -gcflags "-N -l" to turn off compiler optimization, otherwise the compiler may optimize the call to the sum function.

bobo@ubuntu:~/study/go$ go build  -gcflags "-N -l" sum.go

After compilation, the binary executable program sum is obtained. First, let's look at the disassembly code of the main function:

Dump of assembler code for function main.main:
  0x000000000044f4e0 <+0>: mov %fs:0xfffffffffffffff8,%rcx #Don't pay attention for now
  0x000000000044f4e9 <+9>: cmp 0x10(%rcx),%rsp #Don't pay attention temporarily
  0x000000000044f4ed <+13>: jbe 0x44f51d <main.main+61> #Not concerned temporarily
  0x000000000044f4ef <+15>: sub $0x20,%rsp #Reserve 32 bytes of stack space for the main function
  0x000000000044f4f3 <+19>: mov %rbp,0x18(%rsp) #Save the caller's rbp register
  0x000000000044f4f8 <+24>: lea 0x18(%rsp),%rbp #Adjust rbp to point to the start address of the main function stack frame
  0x000000000044f4fd <+29>: movq $0x1,(%rsp) #The first parameter of the sum function (1) is pushed onto the stack
  0x000000000044f505 <+37>: movq $0x2,0x8(%rsp) #The second parameter of the sum function (2) is pushed onto the stack
  0x000000000044f50e <+46>: callq 0x44f480 <main.sum> #Call the sum function
  0x000000000044f513 <+51>: mov 0x18(%rsp),%rbp #Restore the value of the rbp register to the caller's rbp
  0x000000000044f518 <+56>: add $0x20,%rsp #Adjust rsp to point to the stack unit holding the return address of the caller
  0x000000000044f51c <+60>: retq #Return to the caller
  0x000000000044f51d <+61>: callq 0x447390 <runtime.morestack_noctxt> #Don't pay attention temporarily
  0x000000000044f522 <+66>: jmp 0x44f4e0 <main.main> #Don't pay attention for now
End of assembler dump.

The first three and last two instructions of the main function are the code inserted by the go compiler to check stack overflow. We don't need to pay attention now. The other parts are similar to the functions in the C language, but the difference is that the parameters are placed on the stack when the Go language function is called (the 7th and 8th instructions put the parameters on the stack), as can be seen from the fourth instruction The compiler reserves 32 bytes for the main function to store the main stack base address rbp and the two parameters when calling the sum function. These three items each occupy 8 bytes, so a total of 24 bytes. What are the other 8 bytes used for? As can be seen from the sum function below, the remaining 8 bytes are used to store the return value of the sum function.

Dump of assembler code for function main.sum:
  0x000000000044f480 <+0>: sub $0x20,%rsp #Reserve 32 bytes of stack space for the sum function
  0x000000000044f484 <+4>: mov %rbp,0x18(%rsp) #Save the rbp of the main function
  0x000000000044f489 <+9>: lea 0x18(%rsp),%rbp #Set the rbp of the sum function
  0x000000000044f48e <+14>: movq $0x0,0x38(%rsp) #The return value is initialized to 0
  0x000000000044f497 <+23>: mov 0x28(%rsp),%rax #Read the first parameter a(1) from memory to rax
  0x000000000044f49c <+28>: mov 0x28(%rsp),%rcx #Read the first parameter a(1) from memory to rcx
  0x000000000044f4a1 <+33>: imul %rax,%rcx #Calculate a * a, and put the result in rcx
  0x000000000044f4a5 <+37>: mov %rcx,0x10(%rsp) #Assign the value of rcx (a * a) to variable a2
  0x000000000044f4aa <+42>: mov 0x30(%rsp),%rax #Read the second parameter a(2) from the memory to rax
  0x000000000044f4af <+47>: mov 0x30(%rsp),%rcx #Read the second parameter a(2) from memory to rcx
  0x000000000044f4b4 <+52>: imul %rax,%rcx #Calculate b * b, and put the result in rcx
  0x000000000044f4b8 <+56>: mov %rcx,0x8(%rsp) #Assign the value of rcx (b * b) to variable b2
  0x000000000044f4bd <+61>: mov 0x10(%rsp),%rax #Read a2 from memory to register rax
  0x000000000044f4c2 <+66>: add %rcx,%rax #Calculate a2 + b2, and save the result in rax
  0x000000000044f4c5 <+69>: mov %rax,(%rsp) #Assign rax to variable c, c = a2 + b2
  0x000000000044f4c9 <+73>: mov %rax,0x38(%rsp) #copy the value of rax (a2 + b2) to the return value
  0x000000000044f4ce <+78>: mov 0x18(%rsp),%rbp #Restore the rbp of the main function
  0x000000000044f4d3 <+83>: add $0x20,%rsp #Adjust rsp to point to the stack unit with the return address
  0x000000000044f4d7 <+87>: retq #Return to main function
End of assembler dump.

The assembly code of the sum function is relatively intuitive. It is basically a direct translation of the sum function of the go language. You can see that the sum function obtains parameters from the main function stack through the rsp register, and the return value is also stored in the stack frame of the main function through rsp. .

The following figure shows the relationship between the stack and stack registers when the sum function 0x000000000044f4c9 <+73>: mov %rax,0x38(%rsp) has been executed but the next instruction has not yet been executed. Readers can combine the above The assembler code and this figure deepen the understanding of the position and relationship of parameter passing, return value and local variables on the stack in the process of function call.

 

image

 

 

to sum up

Finally, let's summarize the function call process:

  1. Parameter passing. C/c++ code compiled by gcc generally passes parameters through registers. On AMD64 Linux platform, gcc agrees that the first six parameters of function call are passed through rdi, rsi, rdx, r10, r8 and r9 respectively; while the parameters of go language function call are Passed to the called function through the stack, the last parameter is pushed into the stack first, and the first parameter is pushed into the stack last. The parameter is in the caller's stack frame, and the called function gets the parameter by adding a certain offset to rsp;

  2. The call instruction is responsible for pushing the rip register (function return address) when the call instruction is executed into the stack;

  3. gcc accesses local and temporary variables by adding an offset to the rbp, while the go compiler uses the rsp register and an offset to access them;

  4. The ret instruction is responsible for popping the return address of the call instruction into the stack to rip, so as to realize the return from the called function to the calling function to continue execution;

  5. gcc uses the rax register to return the return value of the function call, and go uses the stack to return the return value of the function call.

     

 


Finally, if you think this article is helpful to you, please help me click on the “Looking” at the bottom right corner of the article or forward it to the circle of friends, thank you very much!

image

Guess you like

Origin blog.csdn.net/pyf09/article/details/115219485