Analysis of C language procedure calls from the perspective of assembly

The original text comes from [Tingyun Technology Blog]: http://blog.tingyun.com/web/article/detail/1132

Definition of basic terms

1. The system stack is a memory area located at the end of the process address space.

2. When the data is pushed onto the stack, the stack grows from top to bottom, and this memory area is used to provide memory for the local variables of the function. It also supports passing parameters when calling functions.

3. If nested procedures are called, the stack grows from top to bottom and accepts new activation records to hold all the data required by a procedure.

4. The active record of the current execution process, defined by the frame pointer marking the top position and the stack point marking the bottom position.

5. While the process is executing, while its top limit is fixed, the bottom limit can be expanded (when more memory space is needed).

Analyze stack frames (analyze as follows)

The analysis of the second stack frame in the above figure is as follows: 

1. At the top of the stack frame is the return address, and the old frame pointer saved. The return address specifies the memory address to which the control flow of the code turns at the end of the current procedure, and the old frame pointer saved is the frame pointer of the previous active record. The value of this frame pointer can be used to reconstruct the calling procedure's stack frame after the current procedure ends, which is important when trying to debug a call stack backtrace.

2. The main part of the active record is the memory space allocated for the local variables of the procedure call. In C, such variables are also called automatic variables.

3. When the function is called, the value passed to the function as a parameter is stored at the bottom of the stack.

4. All common computer architectures provide the following two stack manipulation instructions:

  • The push instruction places a value on the stack and subtracts the number of bytes of memory occupied by the value from the stack pointer esp. The end of the stack is moved down to a lower address;

  • The pop instruction pops a value from the stack and increments the stack pointer esp accordingly, that is, the end of the stack is moved up.

5. The general architecture provides two additional instructions for calling and exiting functions (automatically returning to the calling process), which also automatically operate on the stack:

  • The call instruction pushes the current value of the instruction pointer onto the stack and jumps to the starting address of the called function. call instruction: In AT&T assembly, call foo (foo is a label) is equivalent to the following assembly instructions: pushl %eip , movl f, %eip ;   

  • The return instruction pops the return address from the stack and jumps to that address. The implementation of the procedure must have rerurn as the last instruction, and the return address placed on the stack by call is at the bottom of the stack (actually the bottom of the previous active record, and the top of the current active record). ret instruction: In AT&T assembly, ret is equivalent to the following assembly instruction: popl %eip

Procedure call consists of two steps

1. Create a parameter list on the stack. The first argument passed to the called function is pushed last (from right to left). This allows C to pass a variable number of arguments and then pop them off the stack one by one.

2. Call call, which pushes the current value of the instruction pointer (the next instruction after the call) on the stack, and the control flow of the code turns to the called function. The called procedure is responsible for managing the frame pointer ebp and needs to perform the following steps: 

  • The previous frame pointer is pushed on the stack, so the stack pointer moves down.

  • Copy the current value of the stack pointer to the frame pointer, marking the starting position of the stack area of ​​the currently executing function.

  • Execute the code of the current function.

  • At the end of the function, the stored old frame pointer is at the bottom of the stack. Its value is popped from the stack into the frame pointer register (ebp), making it point to the start of the stack area of ​​the previous function. Now, the return address pushed on the stack when the call instruction is executed for the current function is low on the stack.

  • Call return to pop the return address from the stack. The cpu transfers to the return address, and the control flow of the code returns to the calling function.

Specific C language example analysis

At first glance, this approach may seem confusing, so let's start with a simple C example:

On IA-32 systems, the assembly code itself must be given in AT&T notation. 

AT&T assembly syntax can be summed up in the following 5 rules, it is enough. 

1. Registers are referenced by prefixing the name with a percent sign (%). Example: To use the eax register, %eax is used in assembly code. (If inline assembly in C, the C code must specify two percent signs to form a single percent sign in the output to the assembler).

2.源寄存器总是在目的寄存器之前指定。 example,在mov语句中,这意味着 mov a,b 将 寄存器a中的值 内容copy到寄存器b中。

3.操作数的长度由汇编语句的后缀指定。b代步byte,w代表word,l代表long。在IA-32上,将一个长整型从eax寄存器移动到ebx寄存器中,需要指定movl %eax,%ebx。

4.间接内存引用(指针反引用)需要将寄存器包含在括号中,example:movl(%eax),%ebx 将寄存器eax的值指向的内存地址中的长整型copy到ebx寄存器中。

5.offset(register)指定寄存器值与一个偏移量联用,将偏移量加到寄存器的实际值上。example: 8(%eax)指定将eax+8用作一个操作数。该表示法主要用于内存访问,例如指定与栈指针或帧指针的偏移量,以访问某些局部变量。

我们来分析一下 main.s 汇编代码: 

1.从main 主函数开始分析. 在IA-32系统中,ebp寄存器用于帧指针(栈顶),pushl  %ebp 将该ebp寄存器中的值压入系统栈上最低位置,这导致栈顶指针向下移动4byte,这是因为IA-32系统上需要4byte来表示一个指针(pushl中的后缀l,在AT&T汇编中表示一个long型)。

2.第3行,movl   %esp, %ebp  将esp(栈指针)寄存器 的值 copy到ebp(帧指针)寄存器中;把当前的栈指针作为本函数的帧指针。

3.第4行,subl $24,%esp 从栈指针减去0x18 byte,使得栈指针下移,将栈的空间增大了0x18=24byte;

  • 调整栈指针,为局部变量保留空间。局部变量必须放置在栈上,在C代码中,a与b两个局部变量,两者都是整型变量,在内存中都需要4个byte。

  • 因为栈的前4个byte保存了 帧指针的旧值(上一个活动记录),编译器将接下来的两个 4byte内存分配给了这两个局部变量。

  • ebp - 0xC 存着局部变量a的值 3    ; ebp - 0x8 存着局部变量b的值 4 (这里可以看到参数是从右到左 压入栈的)。

4.第5行 ,第6行 movl $0x3, -0xC(%ebp)    movl $0x4, -0x8(%ebp) : 为了向分配的内存空间设置初始值(对应C中 局部变量的初始化),编译器使用了处理器的指针反引用选项。 这两天指令通知编译器,引用“帧指针减12”得到的值 在内存中指向的位置。使用mov指令将值3 写入该位置。

  • 编译器接下来用同样的方法处理第2个局部变量,其在栈的位置稍低,ebp - 0x8 (ebp - 8byte) 位置 ,值为4。

5.第7行,第8行设置第2个参数(b),第9行,第10行负责设置第1个参数(a)。     movl    -8(%ebp), %eax   ;   movl    %eax, 4(%esp)  ; movl    -12(%ebp), %eax;    movl    %eax, (%esp)

  • 局部变量a和b必须用作即将调用的add过程调用的参数。编译器通过将适当的值放置在栈的末端来建立参数列表。

  • 如前所述,第一个参数在最低部。栈指针用于查找栈的末尾。

  • 内存中对应的位置通过指针反引用确定。将栈上的两个局部变量的值分别读入eax寄存器,然后将eax的值写入参数列表中对应的位置。(一般情况)

6.上图描述了 add()函数调用前后,栈的状态。现在可以使用call 指令调用add()函数。call指令 将eip(指令指针寄存器)压入栈,代码控制流在add例程的开始处恢复执行。

  • 根据调用约定,例程首先将此前的帧指针(ebp)压入栈,并将栈指针(esp)赋值给 帧指针(ebp)。

  • 过程的参数可以根据帧指针(ebp)查找。编译器知道参数就在调用函数的活动记录末尾,而在当前活动记录开始处又存储了两个4byte的值(返回地址,旧帧指针)。因此参数可以通过反引用ebp+8和ebp+12访问。

  • add 指令用于 加法,而eax寄存器用作工作空间。结果值就保存在该寄存器中,使它可以传递给调用函数(这里是main())。

  • 为了返回到调用函数,需要执行以下两个操作: <a>使用pop将存储的帧指针(ebp)从栈弹出到ebp寄存器。栈帧的顶端重新恢复到main()的设置;<b>ret将返回地址从栈弹出到 eip(指令指针)寄存器,控制流转向该地址。

7.因为main()中还使用了另一个局部变量(ret)来存储add()函数的返回值,返回后需要将eax寄存器的值 copy 到ret在栈上的位置。

总结

关于AT&T汇编

enter指令 
在AT&T汇编中,enter等效于以下汇编指令: 
pushl %ebp # 将%ebp压栈
movl %esp %ebp # 将%esp保存到%ebp, 这两步是函数的标准开头
leave指令 
在AT&T汇编中,leave等效于以下汇编指令: 
movl %ebp, %esp
popl %ebp
call指令
在AT&T汇编中,call foo(foo是一个标号)等效于以下汇编指令: 
pushl %eip
movl f, %eip
ret指令
在AT&T汇编中,ret等效于以下汇编指令: 
popl %eip

(个人理解)汇编可以用一句话概括:汇编就是在(寄存器和寄存器)或 (寄存器和内存)之间来回move 数据;就是指:数据在内存和寄存器间来回流动,流动的越频繁就代表程序越复杂,比如office这样的大型软件。

从C语言层面分析: 

EBP-xx     一般 是局部变量

EBP+xx   一般都是参数 

EBP+4  返回地址 ,制高点,   很多攻击都是攻击这里, 杀毒软件,这里是重点会扫描。

C函数堆栈中分配的空间,并不会清零,所以在写C代码的时候,局部变量一定要初始化赋值。

参数的传递形式、传递顺序已经栈平衡并不是固定的(不同的函数调用约定)。

关于 寄存器 与内存的区别:   

寄存器位于cpu内部,执行速度快,但比较贵。

内存速度相对较慢,成本低,所以容量能做很大。

寄存器和内存没有本质区别,都是用于存储数据的容器,都是定宽的。

寄存器常用的8个通用寄存器 :EAX,ECX,EDX,EBX,   ESP, EBP, ESI, EDI.

计算机中的几个常用计量单位:BYTE, WORD, DWORD :BYTE(字节) = 8bit ; WORD (字 ) = 16bit ; DWORD (双字)=32bit; 

内存的数量特别庞大,无法每个内存单元都命名一个名字,所以用编号来替代。

我们称计算机CPU是32bit或者64bit,有很多书上说之所以叫32bit计算机是因为寄存器的宽度是32bit,这是不准确的,因为还有很多寄存器是大于32bit的。

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326989740&siteId=291194637