Foreword

We all know that in C, C ++, Java and other languages, the code is executed in units of methods. For example, C needs a int main()function as the initial method of execution, and Java also needs an public static void main(String[] args)execution method as the code. Then with this method as the beginning, more and more methods will be called in the follow-up to accomplish a variety of different things.

The content of this series is to understand how the function call works in the CPU at the level of the CPU call. Because this content will contain a lot of assembly content, it is best to need some assembly knowledge. Of course, I will also explain the assembly instructions used.

This article is built on the X86_64 machine language, that is, when we usually download the software, we will see that there is an X86_64 version in the linux version, and this version is also the version that most people will choose to download. It can also be said that X86_64 is a kind of instruction set that the CPU can use.

The content of this article will also be explained based on a relatively simple C code, because the complex code will increase the difficulty of understanding. code show as below:

#include <stdio.h>
int static add(int i, int j){
    int x = 3;
    int y = 4;
    int sum = x + y + i + j;
    return sum;
}

int main() {
    int x = 1;
    int y = 2;
    int z = add(x, y);
    return z;
}
复制代码

The code has two functions main()and add()it is very simple and basically has the experience of using a language, which can be understood.

How to compile the code

We have written the source code, but the code cannot be run directly because the CPU does not recognize what we have written. We need to compile and compile the code into binary data recognized by the CPU before the CPU can execute it based on these data.

I use gcc to compile on ubuntu.

Install compilation tools

The compilation tool is undoubtedly gcc

Install gcc and execute the following command.

sudo apt-get install gcc

Generally gcc will be installed by default, if you are not sure, you can execute the following command to determine whether to install

gcc --version

If the version number is printed, then it is installed.

How to compile

If we need to use gcc to compile, then we must use the gcc command

gcc main.c -g -o main

Among them main.cis the path to store the source code, which can be a relative path or an absolute path. We can generate an executable file through this command main.

The red box is the executable file, that is, the file we generated through the command, we need to execute it later.

How to debug code

Although we have compiled the code, we currently have no tools to execute / debug this compiled code. So we also need to install a debugging tool, similar to the debug tool, and here I will use the original gdb for debugging, there are many other tools can also be debugged, such as the enhanced version of nemiver with visual interface, gdb cgdb etc.

Installation and debugging tools

Installing and debugging tools is also very simple, just execute the following command

sudo apt-get install gdb

The tool is also installed by default, you can use the following command to determine whether to install

gdb --versionIf the version number is printed, it is installed.

How to debug

It is troublesome to debug the code. I will list the gdb commands needed here first, and then explain each command. If you encounter commands that you do not know during the reading process, you can check them here.

gdb main: This command means to debug the main executable file.
start: After we execute the above execution, we still can't debug, we need to execute this command to start the debugging of the execution code. This command indicates the start of code execution.
next: This command is similar to our single-step execution. Entering this command once is to execute a code, which can be abbreviated asn
disassemble /rm: This command is used to display the disassembly code, we can view the assembly code of the C code we wrote through this command. /mIndicates that the source code and assembly code are arranged together, /rindicating that the hexadecimal code can be seen, which can be abbreviated asdisas
info register: Change the command to view the value in the register at this moment, which infocan be abbreviated as i, registercan be abbreviated asr
step: It is also a single-step execution, but the command indicates that when we encounter a function, we will enter the function, if directly, we nextwill directly skip the function we encounter. Can be abbreviated ass
list: The command to view the source code can be abbreviated asl
x /nfu: This command is more copied, it can view the memory unit.
- nIndicates the number of memory cells to be displayed, which can be a direct decimal number
- fRepresents the display mode, can take the following values
  - x: Display variables in hexadecimal
  - d: Display variables in decimal
  - u: Display unsigned integers in decimal
  - o: Display variables in octal format
  - t: Display variables in binary format
  - .......
- uRepresents the length of an address unit, can take the following values
  - b: Indicates a single byte
  - h: Indicates double bytes
  - w: Indicates four bytes
  - g: Indicates eight bytes
p var: Indicates the value of the variable to be viewed, which varis the name of the variable to be viewed, if you want to view the address of the variable, you can usep &var
q: This command means to exit debugging

Method call process analysis

I will analyze the code a little bit through gdb debugging, mainly for the analysis of assembly code. At the same time, I will also explain the role of assembly instructions so that you do not need to find information yourself.

First, we gdb mainenter into debugging, and then startstart debugging:

At this point we can start debugging with the commands mentioned earlier.

Local variables

First, let's take a look at the assembly code in main. I will explain the assembly code:

The blue box corresponds to the code offset.

First we see the first line of code,, push %rbpwhich %rbprepresents a register, and there are many registers in the CPU:

The picture is from Section 3.4 of "In-depth Understanding of Computer System Third Edition".

As can be seen from the figure, there are a total of 16 registers in the CPU, each register can store up to 64bit of data, and each register has a name, where we use different names for the same register, so as to register Which part of the byte is operated, for example:, %eaxthen we are operating the lower 32 bits of the first register;, then we are operating %axthe lower 16 bits of the %rbxfirst register;, then we are operating all 64 bits of the first register . The operation can be writing or reading.

pushWhat does the instruction do? The instruction has an operand. In fact, it pushes the value of the operand onto the stack, and the stack is actually a continuous memory area in memory, but the data of the stack starts from a high address. Low address pushes data.

So how does the CPU know where this stack address is? It is through a %rspregister, which stores the address of the top of the stack. We know that %rspit can store 64bit data, and our system is 64bit, so it can just store a 64bit memory address. When we want to store data on the stack, we will first change %rspthe address of the register, because it is extended to a lower address, so we need to subtract a value, and then store the data in the extended memory.

For example: we want to store a 0x0123H on the stack, first %rsp= %rsp-2, because the data is only two bytes, so only need to move two memory units, and then put 0x0123H into the extended two memory units. One memory unit is one byte.

The first compilation push %rbp:

Through the above knowledge, we know that the instruction is to push the %rbpvalue in the register to the stack, and %rbpthe value is stored at the address of the bottom of the stack in each stack frame. A method corresponds to a stack frame in the stack. So this instruction is to put the bottom address of the previous stack frame (caller) on the stack. So, does the main()method have a caller? Yes, the main()method is actually __start()called by a method, and this method It is automatically added during compilation.

The second compilation mov %rsp,%rbp:

movThe instruction contains two operands, which are the source operand and the destination operand mov src, desc, which is to %rsppass the value in the register %rbp. Analogizing to a high-level language is an assignment operation %rbp = %rsp. The source and destination operands of different instruction sets are different. For example, the operands in the 8086 instruction set are mov desc, src. For details, please refer to the instruction set manual.

%rspThe register we just know holds the address of the top of the stack, and here is to store the address of the top of the stack %rbp, which is why %rbpthe value in the register is initially pushed onto the stack, because if it is not pushed onto the stack, the original value will be After being overwritten, %rbpthe value in the original register can be saved in the stack, and then the value that mian()can be restored from the stack after the method is executed %rbp.

We use to i rview the execution of this instruction, the register:

The left is the name of the register, the middle is the value stored in each register, the value is displayed in hexadecimal, there is a value on the right, it does not mean that the register can store two 128bit values, the value on the right is only displayed in decimal value. You can convert it yourself. We generally only look at the middle value.

When this instruction is executed, we can see %rbpthat %rspthe values are both 0x7fffffffde50, which also indicates that the address at the top of the stack is 0x7fffffffde50.

The third compilation sub $0x10, %rsp:

subThe instruction is a subtraction instruction, corresponding to the addinstruction, and also has two operands, this instruction is translated into a high-level language %rsp = %rsp - 0x10. Here is %rspthe value of the register minus 16, because just said that the space usage of the stack starts from the high address and expands to the low address, and the %rspregister stores the address of the top of the stack, so if you want to use the stack, you must Make %rspthe address pointed to by the register smaller than the original. So this instruction expands 16 bytes of memory for subsequent use.

According to the value in the previous instruction, we can calculate that %rspthe value at this time should be 0x7fffffffde40. Let's i rverify through the instruction:

We first print out the assembly code, we can find that the code stops at the line <+8> at this time, you can know it by the small arrow on the left. The blue arrow and the green arrow point to the register %rbpand respectively , %rspand they are exactly the same as we calculated.

The fourth compilation:movl $0x1,-0xc(%rbp)

Is this instruction movvery similar to the instruction? It is actually movan extension of the movlinstruction . The instruction means to transfer the double word. In layman's terms, it will operate the lower 32 bits of the given memory block, while increasing the upper 32 bits. The bit is set to 0. There are several extended instructions:

instruction	description
movb	Transfer byte
movw	Transfer word
movl	Transmit double words, only this command will set the high bit to 0
movq	Send four characters
movabsq	Send absolute four characters (I don't know what absolute means)

We mentioned above that %rbpthe address is stored in the register, but here a ()register is used to wrap the register. This bracket has a special role. Its function is to take the value of the address in the register. It is no longer to directly take the value of the register, but to take the value in the register as the address, and then take the memory unit at the address in the content. Anyone who has learned the C language knows that this is not a pointer. Yes, it can indeed be understood with a pointer. Here %rbpyou can see that it is &addr, but it is (%rbp)regarded as yes *addr.

So what does the number in front of the brackets do? I will rewrite it to know what it does:

-0xc(%rbp) => (%rbp + -0xc)

That is, the address in the register is first subtracted to obtain a new address, and then the memory unit in the address is taken. According to the diagram in the third instruction, we can know that the %rbpstored address is 0x7fffffffde50, then the new address obtained is 0x7fffffffde44. So the whole instruction is to store a 0x00000001 in the 4 memory units starting from the new address.

We can check 0x7fffffffde44the data in the memory unit to verify whether this is the case:

I printed the contents of the address above and below the address together for easy comparison.

Pay attention to the red box. This is the memory unit we store. One thing we should pay special attention to here is that %rbpthe address stored in the register is the address of the stack, and the way the stack uses the address is from high to low, so we Fill the data with the higher bits first, and then fill the data with the lower bits. Therefore, the 0x00000001high-order data will be stored in the high address in turn, and the low-order data will be stored in the low address.

So this instruction is also corresponding to `int x = 1; this assignment statement in C code.

The fifth compilation:movl $0x2,-0x8(%rbp)

This instruction has the same effect as the fourth instruction, we calculate the new address is0x7fffffffde48

We can see that the value in this address is 0x00000002.

We can draw a conclusion through the fourth and fifth instructions: local variables in the method are stored on the stack . Because the statements corresponding to these two instructions are int x = 1;, int y = 2;and the variables x and y are all local variables.

The following sixth to ninth instructions are all simple assignment operations, which will not be analyzed in detail here, and you can just infer based on what you explained above.

Parameter passing

What I want to explain here is that these four instructions are used for parameter passing .

Because the tenth instruction is one callq 0x5555555545fa <add>, the main function of this instruction is to call a function. The role of this instruction will be explained in detail later.

We can know through the C code that we main()called the add(int i, int j)function in the function, which takes two parameters. So when calling a add()function, how are the parameters passed to the add()function?

We first observe the first two instructions, observe their source operands, and find that this is where the local variables are stored, and our code does indeed pass local variables to the add()function, so here the local variables x, y The value is transferred to the %edxregister, %eaxregister.

Then observed after two instructions, they will be simply %edx, %eaxthe value is transferred to the %esi, %edi. Why use another register to store the entire value? In fact, because of a predetermined concentration in the instruction, if the parameter is less than seven, the transfer parameters in the register into the order rdi, rsi, rdx, rcx, r8, r9. Because there are only two parameters, the two parameters only through the register rdi, rsifor delivery.

So what if the number of parameters exceeds 6? If there are more than 6 parameters, the excess will be passed through the stack. Note: Only the excess will be pushed onto the stack. That is, the parameters of the part that is exceeded will be pushed onto the stack.

Function call

After the parameters have been stored in the register or stack, you can make a function call.

We see that the assembly instructions that follow are:callq 0x5555555545fa <add>

We encountered a new instruction callqthat accepts an operand, which is a memory address.

First of all, we need to know how the CPU executes instructions one by one? In addition to the 16 registers listed in the table above, the CPU also has some registers for special purposes. One of them is a ripregister (commonly known as PC register). The purpose of this register is to store the address of the next instruction.

How does this register store the address of the next instruction? Let us first look at a more detailed assembly code: disas /rThis instruction will print out the number of bytes occupied by each instruction.

When the CPU to obtain an instruction when it does not immediately executed, but the first current PC register address to increase the number of bytes to get instructions, 下一条指令的地址 = 当前pc寄存器的地址 + 当前获取到的指令的字节数when the PC register after the address complete change, will Start executing the fetched instructions.

for example:

Assuming that the address currently stored in the PC register is 0x0000555555554642, the CPU first goes to this address to fetch the instruction to be executed, mov -0x8(%rbp),%edxand the length of the instruction is 3 bytes from the picture. After the CPU obtains the instruction, it will first change the address stored in the PC register 0x0000555555554642 + 3 = 0x0000555555554645, so the PC register now stores the address of the next instruction to be executed 0x0000555555554645. Then the CPU starts to execute the fetched instruction.

Back to our callqinstructions, according to the above knowledge, we know that callq 0x5555555545fa <add>after the CPU fetches the instruction, it will first modify the value of the PC register 0x000055555555464c, and then the CPU will execute the instruction.

What is the CPU doing when executing this instruction? We can take this instruction apart:

sub 0x8, %rsp
push %rip
mov 0x00005555555545fa,%rip

When the CPU executes the callinstruction, it will first move the pointer at the top of the stack to the lower address by 8 bytes, which is equivalent to extending the content of 8 bytes, and then push ripthe value in the current register onto the stack, that is, store it The 8-byte memory unit just expanded, and finally %ripthe value of the register is changed to the value of the operand.

And 0x00005555555545fais add()the address of the first instruction of the function, we can print the assembly of the function and verify:

According to the red box we can know add()the assembly code, we see that the address of the first instruction happens to be callthe operand of the instruction.

Let's think about another question now, why should we put ripthe address in the stack first ? We all know that after the execution of the called function is completed, we need to return to the calling function to continue execution. Therefore, we can put the address of the next instruction to be executed of the calling function on the stack. After the called function is executed, put the address of the instruction into the stack ripto continue to execute the subsequent instructions of the calling function.

Here is a supplement to pushthe analysis of the instruction. If it is executed push %rbp, the instruction can be split into the following instructions to understand:

sub 0x8,%rsp
mov %rbp,0x8(%rsp)

First, the contents of the stack will be expanded. The size of the expansion is determined according to the number of bits of the operand rbp. Because this is an operation , the data here is 64 bits, so it needs to be expanded by 8 bytes. If it is an operation ebp, then only Just expand 4 bytes. Then %rbpthe contents of the register will be pushed onto the stack.

Subsequent instructions are similar to the previous instructions. You can analyze it yourself.

Function returns

The important thing to explain here is how the called function returns to the calling function, and the return value can also be passed back.

pop %rbpInstruction, which push %rbpis a pair with the first instruction, it pops the data in the stack into the register. We can see that there is no extended %rspaddress in this code , but the local variable is stored by the memory address in the stack, because the stack is only a usable memory address. So this code, the addresses stored in %rspand %rbpin the whole process are the same.

pop %rbpCan be split into the following code to understand:

add 0x8,%rsp
mov -0x8(%rsp), %rbp

So what is taken out here is the address of the top of the stack of the calling function.

The return value of the function is %raxtransferred through the register, so the returnsubsequent return value is put into %raxit. If the register cannot be put, it will be stored on the stack, and then the address where the return value is stored is stored in the register. Can be stored in other memory.

Finally, let ’s talk about the retqinstruction. We can split the instruction into the following instructions:

add 0x8,%rsp
mov -0x8(%rsp),%rip

This instruction is to store the return address previously stored in the stack into the PC register, so that the register can be returned to the original function for execution.

Stack structure

We said so much above, but none of them have a complete stack structure, so we first draw a stack structure:

Expand

Because my current main language is Java, I will compare it with the Java Virtual Machine (JVM). Because the Java virtual machine can also be regarded as a virtual computer.

In the JVM structure, there is also a PC register. The PC register is also the address to store the next instruction, which %ripis exactly the same as the register. So if you understand how the CPU executes an instruction, then the principle and the PC register in the JVM The role can be understood.

So what is the difference between a real computer and a JVM?

According to the above knowledge, we can know that the computer executes a method. When doing some calculations and logical operations, the source data is obtained through registers, and the calculation results are stored in registers. This is an execution model based on registers.

The JVM is not a register-based execution model, but a stack-based execution model. Everyone who has seen "In-depth understanding of the Java Virtual Machine" knows that each thread of the JVM will have an area called the virtual machine stack. The virtual machine stack has the same function as the stack mentioned above, and it is when the method is executed. A stack frame of the called function is generated, and related information is stored in the stack frame. However, the source data for the JVM to perform calculations and logical operations is not acquired and stored through registers, but operated through a stack frame called an operand stack. All data needs to pass through the operand stack to operate.

In fact, it is inseparable from its sect, no matter how the upper layer changes, the principle of the bottom layer remains unchanged after all. It ’s all about changing soup and not changing medicine, so mastering the underlying principles is very helpful for learning some new technologies. It may not have any effect in the short term, but in the long term, the help must be great, so you still have to learn .

References

[1] Wang Shuang. "Assembly Language" Tsinghua University Press

[2] Randal E. Bryant / David O'Hallaron. "In-depth understanding of computer systems third edition" Machinery Industry Press

Analyze the method calling process from the CPU level