[Compile, link, load 1] Preprocessing, compiling, assembling, linking

Introduction

For ordinary application development, we rarely need to pay attention to the compilation and linking process, because the usual development environment is a popular integrated development environment (IDE), such as Visual Studio and so on. Such IDEs generally complete the process of compiling and linking in one step, and the process of merging compiling and linking together is usually called build. Even using the command line to compile a source code file, a simple "gcc hello.c" command contains a very complicated process.

The default configuration, compilation, and linking parameters provided by the IDE and compiler are sufficient for most application development. However, in such a development process, we are often confused by the powerful functions provided by these complex integration tools. The operating mechanism and mechanism of many system software are covered up, and many inexplicable errors in the program make us at a loss. Facing the program We can't do anything about various performance bottlenecks during runtime. What we see is the phenomenon of these problems, but it is difficult to see the essence. The essence of all these problems is the mechanism behind the software operation and the various platforms and tools that support the software operation. If we can deeply understand these mechanisms, then solve these problems Then you will be able to do a job with ease and take it in and out freely.

1. The hidden process

The classic "Hello World" program in C language.

Let's use c language as an example. I have used cpp preprocessing, but the preprocessing file is too long.

#include <stdio.h>
int main()
{
    
    
printf("Hello World\n");
return 0;
}

Under Linux, when we use GCC to compile the Hello World program, we only need to use the simplest command (the source code file name is hello.c):

$gcc hello.c
$./a.out

Hello World In fact, the above process can be decomposed into 4 steps, namely preprocessing (Prepressing), compilation (Compilation), assembly (Assembly) and linking (Linking), as shown in the figure.
insert image description here

Two, preprocessor (Prepressing) - cpp

  • In some articles of preprocessing , it is also called precompilation .

1. Preprocessing instructions

  • Using the preprocessor cpp
    will preprocess the .c file into a .i file.
$cpp hello.c > hello.i
  • combat
    insert image description here

2. Pretreatment process

The preprocessing process mainly deals with the precompiled directives starting with "#" in those source code files. For example, "#include", "#define", etc., the main processing rules are as follows:

  • Delete all "#define" and expand all macro definitions.
  • Processes all conditional precompiled directives, such as "#if", "#ifdef", "#elif", "#else", "#endif".
  • Process the "#include" precompiled directive, inserting the included file into the position of the precompiled directive. Note that this process is recursive, which means that included files may also include other files.
  • Remove all comments "//" and "/* */".
  • Add line number and file name identification, such as #2 "hello.c" 2, so that the compiler can generate line number information for debugging and display the line number when compilation errors or warnings are generated during compilation.
  • Keep all #pragma compiler directives as they are required by the compiler.

The precompiled .i file does not contain any macro definitions because all macros have been expanded and the included files have been inserted into the .i file. So when we can't judge whether the macro definition is correct or whether the header file is included correctly, we can check the precompiled file to determine the problem.

3. Preprocess the generated hello.i file

insert image description here

3. Compilation - cc

1. Compile instructions

The compilation process is to perform a series of lexical analysis, syntax analysis, semantic analysis and optimization on the preprocessed file to produce the corresponding assembly code file . This process is often the core part of what we call the entire program construction, and it is also the most complicated part. one. We will briefly introduce the specific steps of compilation in the next section, which involves some content such as compilation principles. Since it is not the core content of this book, it is only an introduction.

At present, the content of this part is useful for the company's products, because the company's products have their own language. But I am not responsible for the lexical analysis and syntax analysis at the bottom of the product. If you have a chance later, continue to learn more about it.

The above compilation process is equivalent to the following command:

$cc hello.i -S -o hello.s

You can get the assembly output file hello.s

2. The process of compiling

  • lexical analysis
  • Gramma analysis
  • Semantic Analysis
  • intermediate language generation
  • Object code generation and optimization

3. Compile the generated file (as shown below)

There will be an article later, which will introduce in detail what the following commands mean.
insert image description here

Fourth, the assembler (Assembly) - as

1. Assembly instructions

The assembler converts the assembly code into machine-executable instructions, and each assembly statement corresponds to almost one machine instruction. Therefore, the assembly process of an assembler is relatively simple compared to a compiler. It has no complicated syntax, no semantics, and no need for instruction optimization. It is just enough to translate one by one according to the comparison table between assembly instructions and machine instructions." This is where the name "compilation" comes from.

  • We can call the assembler as to complete the above assembly process:
$as hello.s -o hello.o
  • Actual combat
    The target file hello.o generated by compilation is shown in the figure below
    insert image description here

2. View the target file generated by assembly

$objdump -h hello.o

insert image description here

Five, linker (Linking) - ld

1. Linker

  • A linker is a program that packages the object files generated by the compiler into executable files or library files or object files.
    It can be seen from this sentence that the linker is actually a program whose input is the object file generated by the compiler, and whose output is a library, executable file, or other object file.

Target file: .so file generates .out file

linked instructions

ld /usr/lib64/crt1.o /usr/lib64/crti.o /usr/lib64/crtn.o /usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtbeginT.o /usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtend.o -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib64 -L/usr/lib --start-group -lgcc -lgcc_eh -lc --end-group hello.o -o hello.out

or

ld /usr/lib64/crt1.o /usr/lib64/crti.o /usr/lib64/crtn.o /usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtbeginT.o /usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtend.o -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib64 -L/usr/lib -lstdc++ -lm -lgcc_s -lc -lgcc hello.o -o hello.out

2. View the program generated by the link

$objdump -h hello.out
  • As can be seen from the figure below, from the compiled 6 to 23 after linking.

insert image description hereinsert image description here

6. Execute hello.out

$./hello.out

insert image description here

  • Execution error bash: ./test.out: /lib/ld64.so.1: bad ELF interpreter: No such file or directory?

bash: ./test.out: /lib/ld64.so.1: bad ELF interpreter: No such file or directory

See the solution: /lib/ld64.so.1: bad ELF interpreter: There is no such file or directory , because this problem is not easy to deal with, so a separate article is divided to record the process of solving the problem.

Seven, what gcc can do: preprocessing, compiling, assembling, linking

We used cpp, cc, as, and ld for preprocessing, compiling, assembling, and linking above.
In fact, the four processes of preprocessing, compiling, assembling, and linking can be done by gcc itself.

  • 1. Use gcc preprocessing (-E means only precompile).
$gcc -E hello.c -o hello.i
  • 2. Compile
$gcc -S hello.i -o hello.s

Or use the following command:

$gcc -S hello.c -o hello.s

  • 3. Compilation
$gcc -c hello.s -o hello.o 

Or use the gcc command to start from the C source code file, and directly output the object file (Object File) after precompilation, compilation and assembly:

$gcc -c hello.c -o hello.o

Remarks:
gcc -c hello.c -o hello.o lowercase -c is only assembly,
gcc -C hello.c -o hello.o if uppercase -C, will link.

  • 4. Link
$gcc hello.o -o hello.out
或者
$gcc hello.o -o hello
  • 5. Execution
    insert image description here
  • 5. View gcc instructions
[dev1@localhost test01]$ gcc --help
用法:gcc [选项] 文件...
选项:
  -pass-exit-codes         在某一阶段退出时返回最高的错误码
  --help                   显示此帮助说明
  --target-help            显示目标机器特定的命令行选项
  --help={
    
    common|optimizers|params|target|warnings|[^]{
    
    joined|separate|undocumented}}[,...]
                           显示特定类型的命令行选项
  (使用‘-v --help’显示子进程的命令行参数)
  --version                显示编译器版本信息
  -dumpspecs               显示所有内建 spec 字符串
  -dumpversion             显示编译器的版本号
  -dumpmachine             显示编译器的目标处理器
  -print-search-dirs       显示编译器的搜索路径
  -print-libgcc-file-name  显示编译器伴随库的名称
  -print-file-name=<库>    显示 <库> 的完整路径
  -print-prog-name=<程序>  显示编译器组件 <程序> 的完整路径
  -print-multiarch         Display the target's normalized GNU triplet, used as
                           a component in the library path
  -print-multi-directory   显示不同版本 libgcc 的根目录
  -print-multi-lib         显示命令行选项和多个版本库搜索路径间的映射
  -print-multi-os-directory 显示操作系统库的相对路径
  -print-sysroot           显示目标库目录
  -print-sysroot-headers-suffix 显示用于寻找头文件的 sysroot 后缀
  -Wa,<选项>               将逗号分隔的 <选项> 传递给汇编器
  -Wp,<选项>               将逗号分隔的 <选项> 传递给预处理器
  -Wl,<选项>               将逗号分隔的 <选项> 传递给链接器
  -Xassembler <参数>       将 <参数> 传递给汇编器
  -Xpreprocessor <参数>    将 <参数> 传递给预处理器
  -Xlinker <参数>          将 <参数> 传递给链接器
  -save-temps              不删除中间文件
  -save-temps=<arg>        不删除中间文件
  -no-canonical-prefixes   生成其他 gcc 组件的相对路径时不生成规范化的
                           前缀
  -pipe                    使用管道代替临时文件
  -time                    为每个子进程计时
  -specs=<文件>            用 <文件> 的内容覆盖内建的 specs 文件
  -std=<标准>              指定输入源文件遵循的标准
  --sysroot=<目录>         将 <目录> 作为头文件和库文件的根目录
  -B <目录>                将 <目录> 添加到编译器的搜索路径中
  -v                       显示编译器调用的程序
  -###                     与 -v 类似,但选项被引号括住,并且不执行命令
  -E                       仅作预处理,不进行编译、汇编和链接
  -S                       编译到汇编语言,不进行汇编和链接
  -c                       编译、汇编到目标代码,不进行链接
  -o <文件>                输出到 <文件>
  -pie                     Create a position independent executable
  -shared                  Create a shared library
  -x <语言>                指定其后输入文件的语言
                           允许的语言包括:c c++ assembler none
                           ‘none’意味着恢复默认行为,即根据文件的扩展名猜测
                           源文件的语言
-g、-f、-m、-O、-W 或 --param 开头的选项将由 gcc 自动传递给其调用的
 不同子进程。若要向这些进程传递其他选项,必须使用 -W<字母> 选项。

报告程序缺陷的步骤请参见:
<http://bugzilla.redhat.com/bugzilla>.

8. Other knowledge points

1、cc、cc1、ccl、gcc

1.1、cc

  • General CC is a software link to GCC. It's just that CC is a commonly used compilation tool in UNIX, and GCC is used in Linux. Some programs written in UNIX must be placed in Linux to specify the command CC compiler, so CC is specified as GCC. It's actually one thing.
  • From the figure below, we can see that the libraries linked by cc and gcc are the same, so it is correct to say that they are one thing, but it is not that cc links gcc.
    insert image description here

1.2、cc1

cc1 is a compiler provided under the windows system. After installing MinGW, you can see it after installing gcc. The function is the same as gcc.

There is also under linux, in /usr/libexec/gcc/x86_64-redhat-linux/4.8.5/cc1

insert image description here

1.3、ccl

  • Many books and articles say that the linker uses ccl, but we can't find the instruction of ccl. (Unless you rename gcc to ccl yourself).
  • The cll related to compilation can be found, the link is as follows

a:
Compiling Directly via the CCL Compiler , we can see from the website that the generated object file is .ccx instead of .s file. You can see from the following that its name is c8_compiler.

This section describes the CCL compiler, and how to invoke it on different platforms.
The CCL compiler is a separate program, which can be invoked by the user.The name of the program is c8_compiler (or c8_compiler.exe on Microsoft Windows) //软件的名字叫c8_compiler

b:
CCL
https://github.com/Clozure/ccl

Because CCL is written in itself, you need an already-working version of CCL to compile it.

The b ccl, although related to the language, is not a compiler for the c language.

c:
https://www.researchgate.net/figure/CCL-compiler-and-interpreter_fig1_228539904

To sum up: the ccl compilation command mentioned in many books does not exist, and the correct one should be cc1\cc\gcc.

1.4、gcc

GCC: The full name is GNU Compiler Collection, the GNU compiler suite, which is the default compiler for widely used Linux systems (especially for compiling the Linux kernel). GCC can support processors with multiple architectures, and its cross-platform features are relatively excellent. However, the GNU organization requires all the code to be completed by itself (to prevent copyright issues), so although GCC has extensive hardware support, it is not the compiler with the best performance on each hardware platform.

1.5、icc

ICC: The full name is Intel C++ Compiler, which is a C/C++/Fortran compiler package developed by Intel. It is suitable for Linux, Microsoft and Mac OS X operating systems. There is no non-IA instruction set version (that is, it is only used by x86 architecture CPUs). ICC is widely used in high-performance computing, distributed computing and other commercial computing fields. Its vectorization and parallelization performance is the benchmark in the industry, and it can give full play to the characteristics of modern processors.

2、cpp

Reference:
Oracle_manpages section_usercommands_usercommandscpp

  • NAME
    cpp - C language preprocessor
  • Usage Summary
/usr/lib/cpp [-BCHMpPRT] [
-undef] [-Dname] [
-Dname = def] 
     [-Idirectory] [-U
name] [-Ydirectory] 
     [input-file [output-file]]
  • Description
    cpp is the C language preprocessor. cpp is also used as a preprocessor for the first pass of other Sun compilers.
    cpp can accept two filenames as arguments. input-file and output-file are the input and output files for the preprocessor, respectively. By default, these are standard input and standard output.

3、gcc、g++

What are gcc and g++, and what's the difference?

Since its development (June 2020), the GCC compiler has been updated to version 10.1.0, and its function has been expanded from being able to compile only C language initially to being able to compile multiple programming languages, including C++.

In addition, the current GCC compiler also supports compiling programs such as Go, Objective-C, Objective-C ++, Fortran, Ada, D and BRIG (HSAIL), and even GCC 6 and previous versions also support compiling Java. program. However, this tutorial mainly explains how to use the GCC compiler to compile and run C and C++ programs, so how to use the GCC compiler to compile other programming languages ​​will not be explained in detail.

So, on the premise that the C language or C++ code has been edited, how can we call the GCC compiler to compile the program for us? Very simple, the GCC compiler has provided us with an interface to call it. For C language or C++ programs, you can call the GCC compiler by executing gcc or g++ instructions.

It is worth mentioning that in actual use, we are more accustomed to compiling C language programs with gcc instructions, and compiling C++ codes with g++ instructions. It should be emphasized that this is not the difference between gcc and g++, gcc instructions can also be used to compile C++ programs, and g++ instructions can also be used to compile C language programs.

So, what is the difference between gcc and g++? Next, I will give readers a detailed explanation.

In fact, as long as the program code is compiled by GCC, you can use the gcc command to complete the compilation. It can be understood that gcc is a general compilation instruction of the GCC compiler, because according to the suffix name of the program file, the gcc instruction can determine the type of programming language used by the current program, for example:

  • xxx.c: Compile this file by compiling C language program by default;
  • xxx.cpp: Compile this file in the way of compiling C++ program by default.
  • xxx.m: Compile this file by compiling Objective-C program by default;
  • xxx.go: By default, this file is compiled in the way of compiling Go language programs;

Of course, the gcc command also provides users with the interface of "manually specifying the representative compilation method", that is, using the -x option. For example, gcc -xc xxx means to compile the xxx file in the way of compiling C language code; and gcc -xc++ xxx means to compile the xxx file in the way of compiling C++ code. Regarding the usage of the -x option, specific examples will be given later.

However, if the g++ command is used, no matter what the suffix of the object file is, the command will compile the file in the same way as compiling C++ code. In other words, for .c files, gcc instructions are treated as C language codes, and g++ instructions are treated as C++ codes. But for .cpp files, both gcc and g++ compile as C++ code.

4. Ask chatGPT: which are the four processes of c language preprocessing, compiling, assembling, and linking, which are implemented in linux

chatGPT replied as follows:

On the Linux system, the four processes of C language preprocessing, compiling, assembling and linking can be implemented using the following tools:

  1. Preprocessing - Preprocessing can be done using the GNU C Preprocessor (cpp) tool. For example, enter the following command in a terminal to generate preprocessed code:
   cpp hello.c > hello.i
  1. Compilation - Compilation can be done with a C language compiler such as GCC (GNU Compiler Collection) or Clang. For example, enter the following command in the terminal to compile the preprocessed code into an object file:
   gcc -c hello.i -o hello.o
  1. Assembly - The compiled object file can be converted to assembly code using GNU Assembler (gas) or another assembler. For example, enter the following command in the terminal to assemble the object file into assembly code:
   objdump -d hello.o > hello.s
  1. Linking - Multiple object files can be linked into a single executable using the GNU Linker (ld) or other linkers. For example, enter the following command in the terminal to link multiple object files into an executable file:
   gcc -o hello hello.o

Please note that different Linux systems may use different tools and commands for C language compilation and development. The above examples are for reference only, and should be adjusted according to your own needs and situations during actual operation.

5. Ask chatGPT: The hello.i file generated by c language preprocessing, in addition to using gcc to compile it into a hello.s file, what other ways can it be compiled into a hello.s file

In addition to using the GCC compiler, you can use other C language compilers or preprocessing tools to compile the preprocessed "hello.i" file into assembly code (".s" file). For example:

  1. Clang - Clang is another popular open source C compiler that, like GCC, can compile preprocessed C source files into assembly code. You can generate an assembly code file with the following command:

    clang -S hello.i

  2. LLVM IR - LLVM IR is an intermediate representation that can be converted to assembly code by the LLVM compiler. You can compile preprocessed C source files to LLVM IR with:

    clang -emit-llvm -S hello.i

    You can then compile LLVM IR to assembly code with:

    llc -march=x86-64 hello.ll -o hello.s

  3. CPP - Instead of using a C language compiler, you can also use the preprocessing tool CPP to convert preprocessed C source files into assembly code. You can generate an assembly code file with the following command:

    cpp hello.i > hello.cpp
    g++ -S hello.cpp

Note that different compilers and tools may produce different assembly code, so the resulting ".s" files may vary.

Reference
1. "Programmer's Self-Cultivation Link Loading and Library"
2. In-depth understanding of computer systems Chapter 7 Link Knowledge Arrangement
3. The four stages of C compiler translation of C programs
4. Programmers quickly get started with C language - engineering articles (12)
5. GCC command (preprocess cpp --> compile ccl --> assemble as --> link ld) (reproduced)
6. GCC\LD

Guess you like

Origin blog.csdn.net/junxuezheng/article/details/130068109