x86-64 registers and stack frame (reproduced)

Overview

Speaking x86-64, the total can not help but to talk about AMD's regressed, x86-64 is x86 series synthesizer, inherited the fine tradition of backward compatibility, first proposed by AMD, codenamed AMD64; precisely because backwards compatible, AMD company fought a good comeback. Intel had to turn led to the production of AMD64 compatible CPU. This is a classic David and Goliath battle of the IT industry. However, we order the name of continuity, more accustomed to call this system architecture is x86-64.

X86-64 backward compatibility, while more important is injected new features, in particular: x86-64 has two operating modes, 32-bit OS can run both in the traditional model, as the i386 CPU to use; 64 and can run in compatibility mode, the more amazing that can run 64 bit applications on a 32-bit OS. There is such a good thing, the user will definitely buy it.

It is worth mentioning that, X86-64 opened a new era compiler in the previous era, the number of transistors Intel CPU to Moore's Law has been in the index, a variety of new functions after another, such as: condition data transfer instruction cmovg, SSE instruction and so on. But GCC can only conservatively assumed target machine's CPU 1985 of i386, amount. . . Such compiled code efficiency can be imagined, although GCC provides a number of additional optimization options, but it made a very high demand for application developers, will be a handful. X86-64 appear, giving GCC provides a great opportunity in the new x86-64 machines, abandon conservative assumptions, and then take advantage of the various features x86-64, such as: the procedure call by register to pass parameters, rather than the traditional stack. Another example: a conditional transfer instruction to make use, instead of controlling the jump instruction.

About register

To be clear, this article focuses on general-purpose registers (later referred to as the register). Since it is common to use and is not limited; register described later usage rule or convention, only GCC (G ++) abide by the rules. Because we want to analyze GCC C compiler (C ++) program, so it is helpful to know the rules.

In the structure of the textbook system, the register is usually said to the register file is actually a storage area on the CPU, but prefer to use the identifier to represent, rather than address it.

In X86-64, are 64-bit, 32-bit x86 relative to all registers, the identifier has changed, for example: from the original into a% ebp% rbp. For backward compatibility,% ebp still be used, but the low point of 32% rbp.

X86-64 register changes, not only in the median, more reflected in the number of registers. Newly added to the register% r8% r15. Plus the original 8 x86, a total of 16 registers.
Just said, integrated on the CPU registers, the memory access speed faster than several orders of magnitude, more registers, the GCC can be used more registers, the stack memory used before the replacement, thereby greatly enhancing performance.

Let registers for its own use, have to understand their use, which involves the use of a function call, the X86-64 has 16 64-bit registers, namely:

%rax，%rbx，%rcx，%rdx，%esi，%edi，%rbp，%rsp，%r8，%r9，%r10，%r11，%r12，%r13，%r14，%r15。

among them:

% Rax as a function return values.
% Rsp stack pointer register points to the top
% Rdi,% rsi,% rdx,% rcx,% r8,% r9 as function parameters, in turn corresponds to the first parameter, the second parameter. . .
% Rbx,% rbp,% r12,% r13,% 14,% 15 for data storage, follow the rules used by the caller, it simply is casually used, before calling the subroutine you want to back it, in case he is modified

% R10,% r11 as a data store, follow the rules used by the caller, simply said he needed to save the original value before it is to use

Stack frame

Stack frame structure

Belonging to the process-oriented language C, the most important feature is to he a program into several processes (functions), such as: entry function is main, and then call each subroutine. In the corresponding machine language, the GCC is converted into the process stack frame (Frame), simply, a process corresponding to each stack frame. X86-32 typical stack frame structure, the frame start points to the stack% ebp,% esp points to the top.

Function entry and return

Function entry and exit, and to complete the call by the RET instruction, for example a

#include

#include </code>

int foo ( int x )

{

int array[] = {1,3,5};

return array[x];

} /* ----- end of function foo ----- */

int main ( int argc, char *argv[] )

{

int i = 1;

int j = foo(i);

fprintf(stdout, "i=%d,j=%d\n", i, j);

return EXIT_SUCCESS;

} /* ---------- end of function main ---------- */

命令行中调用gcc，生成汇编语言：

Shell > gcc –S –o test.s test.c

Main函数第40行的指令Callfoo其实干了两件事情：

Pushl %rip //保存下一条指令（第41行的代码地址）的地址，用于函数返回继续执行
Jmp foo //跳转到函数foo

Foo函数第19行的指令ret 相当于：

popl %rip //恢复指令指针寄存器

栈帧的建立和撤销

还是上一个例子，看看栈帧如何建立和撤销。

说题外话，以”点”做为前缀的指令都是用来指导汇编器的命令。无意于程序理解，统统忽视之，比如第31行。

栈帧中，最重要的是帧指针%ebp和栈指针%esp，有了这两个指针，我们就可以刻画一个完整的栈帧。

函数main的第30~32行，描述了如何保存上一个栈帧的帧指针，并设置当前的指针。
第49行的leave指令相当于：

Movq %rbp %rsp //撤销栈空间，回滚%rsp。

Popq %rbp //恢复上一个栈帧的%rbp。

同一件事情会有很多的做法，GCC会综合考虑，并作出选择。选择leave指令，极有可能因为该指令需要存储空间少，需要时钟周期也少。

你会发现，在所有的函数中，几乎都是同样的套路，我们通过gdb观察一下进入foo函数之前main的栈帧，进入foo函数的栈帧，退出foo的栈帧情况。

Shell> gcc -g -o testtest.c

Shell> gdb --args test

Gdb > break main

Gdb > run

进入foo函数之前：

你会发现rbp-rsp=0×20，这个是由代码第11行造成的。
进入foo函数的栈帧：

回到main函数的栈帧，rbp和rsp恢复成进入foo之前的状态，就好像什么都没发生一样。

可有可无的帧指针

你刚刚搞清楚帧指针，是不是很期待要马上派上用场，这样你可能要大失所望，因为大部分的程序，都加了优化编译选项：-O2，这几乎是普遍的选择。在这种优化级别，甚至更低的优化级别-O1，都已经去除了帧指针，也就是%ebp中再也不是保存帧指针，而且另作他途。

在x86-32时代，当前栈帧总是从保存%ebp开始，空间由运行时决定，通过不断push和pop改变当前栈帧空间；x86-64开始，GCC有了新的选择，优化编译选项-O1，可以让GCC不再使用栈帧指针，下面引用 gcc manual 一段话：

-O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging.

这样一来，所有空间在函数开始处就预分配好，不需要栈帧指针；通过%rsp的偏移就可以访问所有的局部变量。说了这么多，还是看看例子吧。同一个例子，加上-O1选项：

Shell>: gcc –O1 –S –o test.s test.c

分析main函数，GCC分析发现栈帧只需要8个字节，于是进入main之后第一条指令就分配了空间(第23行)：

Subq $8, %rsp

然后在返回上一栈帧之前，回收了空间（第34行）：

Addq $8, %rsp

等等，为啥main函数中并没有对分配空间的引用呢？这是因为GCC考虑到栈帧对齐需求，故意做出的安排。再来看foo函数，这里你可以看到%rsp是如何引用栈空间的。等等，不是需要先预分配空间吗？这里为啥没有预分配，直接引用栈顶之外的地址？这就要涉及x86-64引入的牛逼特性了。

访问栈顶之外

通过readelf查看可执行程序的header信息：

红色区域部分指出了x86-64遵循ABI规则的版本，它定义了一些规范，遵循ABI的具体实现应该满足这些规范，其中，他就规定了程序可以使用栈顶之外128字节的地址。

这说起来很简单，具体实现可有大学问，这超出了本文的范围，具体大家参考虚拟存储器。别的不提，接着上例，我们发现GCC利用了这个特性，干脆就不给foo函数分配栈帧空间了，而是直接使用栈帧之外的空间。@恨少说这就相当于内联函数呗，我要说：这就是编译优化的力量。

寄存器保存惯例

过程调用中，调用者栈帧需要寄存器暂存数据，被调用者栈帧也需要寄存器暂存数据。如果调用者使用了%rbx，那被调用者就需要在使用之前把%rbx保存起来，然后在返回调用者栈帧之前，恢复%rbx。遵循该使用规则的寄存器就是被调用者保存寄存器，对于调用者来说，%rbx就是非易失的。

反过来，调用者使用%r10存储局部变量，为了能在子函数调用后还能使用%r10，调用者把%r10先保存起来，然后在子函数返回之后，再恢复%r10。遵循该使用规则的寄存器就是调用者保存寄存器，对于调用者来说，%r10就是易失的，举个例子：

#include <stdio.h>

#include <stdlib.h>

void sfact_helper ( long int x, long int * resultp)

{

if (x<=1)

*resultp = 1;

else {

long int nresult;

sfact_helper(x-1,&nresult);

*resultp = x * nresult;

}

} /* ----- end of function foo ----- */

long int

sfact ( long int x )

{

long int result;

sfact_helper(x, &result);

return result;

} /* ----- end of function sfact ----- */

int

main ( int argc, char *argv[] )

{

int sum = sfact(10);

fprintf(stdout, "sum=%d\n", sum);

return EXIT_SUCCESS;

} /* ---------- end of function main ---------- */

命令行中调用gcc，生成汇编语言：

Shell>: gcc –O1 –S –o test2.s test2.c

在函数sfact_helper中，用到了寄存器%rbx和%rbp，在覆盖之前，GCC选择了先保存他们的值，代码6~9说明该行为。在函数返回之前，GCC依次恢复了他们，就如代码27-28展示的那样。

看这段代码你可能会困惑？为什么%rbx在函数进入的时候，指向的是-16（%rsp），而在退出的时候，变成了32(%rsp) 。上文不是介绍过一个重要的特性吗？访问栈帧之外的空间，这是GCC不用先分配空间再使用；而是先使用栈空间，然后在适当的时机分配。第11行代码展示了空间分配，之后栈指针发生变化，所以同一个地址的引用偏移也相应做出调整。

X86时代，参数传递是通过入栈实现的，相对CPU来说，存储器访问太慢；这样函数调用的效率就不高，在x86-64时代，寄存器数量多了，GCC就可以利用多达6个寄存器来存储参数，多于6个的参数，依然还是通过入栈实现。了解这些对我们写代码很有帮助，起码有两点启示：

尽量使用6个以下的参数列表，不要让GCC为难啊。
传递大对象，尽量使用指针或者引用，鉴于寄存器只有64位，而且只能存储整形数值，寄存器存不下大对象

让我们具体看看参数是如何传递的：

#include <stdio.h>

#include <stdlib.h>

int foo ( int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7 )

{

int array[] = {100,200,300,400,500,600,700};

int sum = array[arg1]+ array[arg7];

return sum;

} /* ----- end of function foo ----- */

int

main ( int argc, char *argv[] )

{

int i = 1;

int j = foo(0,1,2, 3, 4, 5,6);

fprintf(stdout, "i=%d,j=%d\n", i, j);

return EXIT_SUCCESS;

} /* ---------- end of function main ---------- */

命令行中调用gcc，生成汇编语言：

Shell>: gcc –O1 –S –o test1.s test1.c

Main函数中，代码31~37准备函数foo的参数，从参数7开始，存储在栈上，%rsp指向的位置；参数6存储在寄存器%r9d；参数5存储在寄存器%r8d；参数4对应于%ecx；参数3对应于%edx；参数2对应于%esi；参数1对应于%edi。

Foo函数中，代码14-15，分别取出参数7和参数1，参与运算。这里数组引用，用到了最经典的寻址方式，-40（%rsp，%rdi，4）=%rsp + %rdi *4 + (-40);其中%rsp用作数组基地址；%rdi用作了数组的下标；数字4表示sizeof（int）=4。

结构体传参

应@桂南要求，再加一节，相信大家也很想知道结构体是如何存储，如何引用的，如果作为参数，会如何传递，如果作为返回值，又会如何返回。

看下面的例子：

#include <stdio.h>

#include <stdlib.h>

struct demo_s {

char var8;

int var32;

long var64;

};

struct demo_s foo (struct demo_s d)

{

d.var8=8;

d.var32=32;

d.var64=64;

return d;

} /* ----- end of function foo ----- */

int

main ( int argc, char *argv[] )

{

struct demo_s d, result;

result = foo (d);

fprintf(stdout, "demo: %d, %d, %ld\n", result.var8,result.var32, result.var64);

return EXIT_SUCCESS;

} /* ---------- end of function main ---------- */

我们缺省编译选项，加了优化编译的选项可以留给大家思考。

Shell>gcc -S -o test.s test.c

上面的代码加了一些注释，方便大家理解，
问题1：结构体如何传递？它被分成了两个部分，var8和var32合并成8个字节的大小，放在寄存器%rdi中，var64放在寄存器的%rsi中。也就是结构体分解了。
问题2：结构体如何存储? 注意看foo函数的第15~17行注意到，结构体的引用变成了一个偏移量访问。这和数组很像，只不过他的元素大小可变。

问题3：结构体如何返回，原本%rax充当了返回值的角色，现在添加了返回值2：%rdx。同样，GCC用两个寄存器来表示结构体。
恩，即使在缺省情况下，GCC依然是想尽办法使用寄存器。随着结构变的越来越大，寄存器不够用了，那就只能使用栈了。

总结

了解寄存器和栈帧的关系，对于gdb调试很有帮助；过些日子，一定找个合适的例子和大家分享一下。

参考

1. 深入理解计算机体系结构
2. x86系列汇编语言程序设计

转载来源：http://ju.outofmemory.cn/entry/769

https://blog.csdn.net/u013982161/article/details/51347944