The operating principle of HelloWorld (constantly updated)

Always want to get to understand printf ( "HelloWorld! \ N") is how to print out on the screen, so take advantage of Mid-Autumn Festival dig deep as possible. This article will remain constant renewal.

1. Compile the link stage

1.1 Pretreatment

First, we write the following program named main.c.

#include <stdio.h>
int main(int argc,char **argv()){
    printf("HelloWorld!\n");
    return 0;
}

Enter gcc -E main.c -o main.ipreprocessing work here, the option "-o" refers to the output destination file is main.i

Precompiled processing rules:

All "#define" delete, and expand all the macro definitions
to deal with all the conditions of the pre-compiler directives, such as: "#if #ifdef #elif #else #endif"
to deal with all the "#include" pre-compiled instructions
to remove all comment "//", "/ * * /"
Add file name and line number identification, in order to line number information generated at compile time and can display line numbers when used in compilation errors or warnings
to keep all the "#pragma" compiler directives

We can open main.i look inside stdio.h header file has been launched, including some types of function declarations etc., taken fragments as follows:

Here found outside printf statement is as follows:

extern int printf (const char *__restrict __format, ...);

call format printf () function is: printf("格式化字符串",输出表列). Reference examplesprintf("%2c-%2c-%2c-%2c\n",'D','e','m','o');

1.2 Compile (generating assembly code main.s)

Gcc compiler compiling the pre-finished files lexical analysis, syntax analysis, generate the corresponding assembly code files semantic analysis and optimization. Use the command gcc -S main.i -o main.swill compile the front pretreated main.i file into assembly language file main.s

        .file   "main.c";
        .text
        .section        .rodata
.LC0:
        .string "HelloWorld!"
        .text
        .globl  main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
        pushq   %rbp;
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        subq    $16, %rsp
        movl    %edi, -4(%rbp)
        movq    %rsi, -16(%rbp)
        leaq    .LC0(%rip), %rdi
        call    puts@PLT
        movl    $0, %eax
        leave
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE0:
        .size   main, .-main
        .ident  "GCC: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0"
        .section        .note.GNU-stack,"",@progbits

We find printf function call is converted to a call puts command, rather than call printf command, which seems a bit unexpected. But do not worry, this is an optimizing compiler for the printf. Practice has proved that the parameters for printf is if the '\ n' end of the plain string, printf will be optimized for puts function, and end of the string '\ n' symbol is eliminated. In addition, it will generate a normal call printf command.

the puts () function has two characteristics:

  • the puts () when displayed at the end of the string will automatically add a newline.
  • puts () stops output when the null character, it is necessary to ensure free character.

Note assembler consists of three distinct elements:

Indication (Directives) to start dot number, to indicate to the compiler, linker, debugger useful structural information. Itself is not indicative of assembly instructions. For example, .file just record the original source file. .data segment represents data (section) of the start address, and the .text represents the actual starting of the program code. It represents .string string constants in the data segment. .globl main label main indicate a global symbol that can be accessed by other modules of code. As for the rest of the instructions you can ignore.

Tag (Labels) ends with a colon, used to associate the name and location tag label appears. For example, the label .LC0: indicates the name string is followed by the label main .LC0:. Pushq% rbp instruction indicates the first instruction is the main function. By convention, the start of temporary local label dot tags are generated by the compiler, other tags are visible to the user functions and global variable name.

Instructions (Instructions) the actual assembly code (pushq% rbp), usually retracted, and to distinguish instructions and labels.

Tips: AT & T syntax and Intel syntax
attention GNU tools using traditional AT & T syntax. The Unix-like operating systems, AT & T syntax is used on a variety of processors. Intel syntax is generally used in DOS and Windows systems. Here is the command AT & T syntax:
movl% ESP,% ebp
movl is the instruction name. Indicates% ebp and esp registers are in AT & T syntax, the first source operand, the second operand is the object.
In other places, such as interl manual, you will see there is no syntax% of intel, it's just the opposite operand order. Here is the Intel syntax:
MOVQ EBP, ESP
when reading the manual on the page, you can determine whether there is a% AT & T or Intel syntax.

Each has its special purpose registers, not all commands can be applied to each register. With the progress of the design, the new instructions and addressing modes been added, so many register becomes equivalent. Few instructions left behind, and in particular the associated string handling, and requires the use of% rsi% rdi. Further, two registers are retained as a stack pointer, respectively (% rsp) and a base pointer (% rbp). The final eight registers numbered and are not particularly limited. Over the years, the architecture expanded from 8 to 16, 32, so that each register has some internal structure:

% Rax lower 8 bits are 8-bit registers% al, 8 bits are alone% ah. The lower 16 bits are% ax, it is the low 32% eax, is the entire 64-bit% rax.

Register% r8-% r15 have the same structure, but slightly different naming:

1.3 compile (generate main.o file)

Compilation of the assembler code into an intermediate compiled object files. The assembly process can use the following command:

gcc -c main.s -o main.o

main.o already is a binary file, and open direct garbled will find one, we can open obj file with gdb or objdump disassembly instructions, as follows:

? ? What are the goals document is?

--目标文件是指编译器编译源代码后生成的二进制文件,再通过链接器和资源文件链接就成可执行文件了。OBJ只给出了程序的【相对地址】,而可执行文件是【绝对地址】。CPP对应的二进制代码格式obj,是未经重定位的!以下摘自《程序员的自我修养》

现在PC平台流行的可执行文件格式(Executable),主要是Windows下的PE(Portable Executable)和linux的ELF (Executable Linkable Format),他们都是COFF(Common File Format)格式的变种。COFF是由Unix System VRelease 3首先提出并且使用的文件规范,后来微软公司基于COFF格式,制定了PE格式标准,并将其用于当时的Windows NT系统。System VRelease 4在COFF的基础上引入了ELF格式,目前流行的Linux系统也是以ELF作为基本的可执行文件格式。这也能解释为什么目前PE和ELF如此相似的主要原因,因为他们都是来源于同一种可执行文件格式COFF。目标文件就是源代码编译后为进行链接的那些中间文件(Windows下面为.obj文件;Linux下面为.o文件),它和可执行文件的内容和结构很相似,所以一般和可执行文件采用同一种格式进行存储。从广义上来讲,目标文件与可执行文件的格式其实几乎是一模一样的,所以,我们可以广义的将目标文件和可执行文件看成是同一种类型的文件。在Windows下,我们把目标文件和可执行文件都统一称为PE-COFF文件,在Linux下,我们把它们统称为ELF文件。
当然,事情没有这么简单!不光是可执行文件(Windows下面的.exe和Linux下面的ELF文件)按照可执行文件格式存储。动态链接库(DLL,dynamic linking library)[Windows下面的.dll文件和Linux下面的.a文件]以及静态链接库(Static linking Library)[Windows下面的.lib文件和Linux下面的.a文件]都是按照可执行文件格式存储的。只不过,在Windows平台下,他们按照PE-COFF格式存储,而在Linux平台下按照ELF格式进行储存。
ELF文件标准里面把系统中采用ELF格式的文件归为以下四类:

假设上图的可执行文件格式是ELF,从图中可以看到,ELF文件的开头是一个“文件头”,他描述了整个文件的文件属性,包括文件是否可执行、是静态链接还是动态链接以及入口地址(如果是可执行文件)、目标硬件、目标操作系统等信息。头文件包含一个段表(Section Table),段表事实是一个描述文件中各个段的数组。段表描述了文件中各个段在文件中的偏移位置及段的属性,从段表里面可以得到每个段的所有信息。文件头后面就是各个段的内容,比如代码段保存的就是程序的指令,数据段里面保存的就是程序的静态变量等。

1.4 链接(生成可执行程序)

链接器 ld:负责将程序的目标文件与所需的所有附加的目标文件连接起来,附加的目标文件包括静态连接库和动态连接库,链接是链接器ld把中间目标文件和相应的库一起链接成为可执行文件。

gcc main.o -o main

如果前面使用的是$ gcc main.c命令,默认会产生一个a.out 的可执行文件,使用命令./a.out执行该可执行文件

??为什么会使用a.out作为名字?

-- 《Expert C Programming》中提到它是assembler output(汇编程序输出)的缩写,默认使用a.out的名字是UNIX“没什么理由,但是我们就是这么做的”思维的一个例子。

Guess you like

Origin www.cnblogs.com/acewzj/p/11519495.html