Header file processing
Compiler, the compiler reads the source program (character stream), analyzes the lexical and grammar, converts high-level language instructions into functionally equivalent assembly code, and then converts the assembler into machine language. Executing the file format requires linking to generate an executable program.
C source program header file -> pre-compiled processing (cpp) -> compiler itself--> optimized program--> assembler--> linker -> executable file
1. Compile preprocessing
read c source Program, which processes pseudo-instructions (instructions beginning with #) and special symbols. The
pseudo-instructions mainly include the following four aspects
(1) Macro definition instructions, such as #define Name TokenString, #undef, etc. For the previous pseudo-instruction, what pre-compilation needs to do is to replace all the Names in the program with TokenString, but the Name as a string constant is not replaced. For the latter, the definition of a certain macro will be cancelled, so that the occurrence of the string will no longer be replaced.
(2) Conditional compilation instructions, such as #ifdef, #ifndef, #else, #elif, #endif, etc. The introduction of these pseudo-instructions allows programmers to decide which codes the compiler will process by defining different macros. The pre-compiler will filter out those unnecessary codes according to the relevant files
(3) The header file contains instructions, such as #include "FileName" or #include <FileName>, etc. A large number of macros (the most common is character constants) are generally defined by the pseudo-instruction #define in the header file, and they also contain declarations of various external symbols. The main purpose of using header files is to make certain definitions available to multiple different C source programs. Because in the C source program that needs to use these definitions, you only need to add a #include statement instead of repeating these definitions in this file. The pre-compiler will add all the definitions in the header file to the output file it produces for the compiler to process.
The header files included in the c source program can be provided by the system, and these header files are generally placed in the /usr/include directory. #Include them in the program to use angle brackets (< >). In addition, developers can also define their own header files. These files are generally placed in the same directory as the c source program. At this time, double quotation marks ("") should be used in #include.
(4) Special symbols, the pre-compiler can recognize some special symbols. For example, the LINE identifier that appears in the source program will be interpreted as the current line number (decimal number), and FILE will be interpreted as the name of the C source program currently being compiled. The precompiled program will replace these strings appearing in the source program with appropriate values.
What the pre-compiled program accomplishes is basically the "replacement" of the source program. After this substitution, an output file with no macro definitions, no conditional compilation instructions, and no special symbols is generated. The meaning of this file is the same as the source file without preprocessing, but the content is different. In the next step, this output file will be translated into machine instructions as the output of the compiler.
2. Compilation stage
The output file obtained by pre-compilation will only have constants. Such as the definition of numbers, strings, variables, and C language keywords, such as main, if, else, for, while, {,}, +,-,*,/, etc. What the precompiler needs to do is to pass lexical analysis and grammatical analysis. After confirming that all instructions comply with the grammatical rules, they are translated into equivalent intermediate code representations or assembly codes.
3. Optimization stage
Optimization processing is a relatively difficult technology in the compilation system. The problems it involves are not only related to the compilation technology itself, but also have a great relationship with the hardware environment of the machine. The optimization part is the optimization of the intermediate code. This optimization does not depend on the specific computer. The other optimization is mainly for the generation of target code. In the figure above, we put the optimization stage behind the compiler, which is a more general representation.
For the former optimization, the main work is to delete common expressions, loop optimization (code outsourcing, strength weakening, changing loop control conditions, merging of known quantities, etc.), copy propagation, and delete useless assignments, etc.
The latter type of optimization is closely related to the hardware structure of the machine. The most important consideration is how to make full use of the values of the variables stored in the hardware registers of the machine to reduce the number of memory accesses. In addition, how to adjust instructions according to the characteristics of machine hardware execution instructions (such as pipeline, RISC, CISC, VLIW, etc.) to make the target code shorter and the execution efficiency higher is also an important research topic.
The optimized assembly code must be assembled by the assembler and converted into corresponding machine instructions before it can be executed by the machine.
4. Assembly process The
assembly process actually refers to the process of translating assembly language codes into target machine instructions. For each C language source program processed by the translation system, the corresponding target file will be finally obtained through this process. What is stored in the target file is the machine language code of the target equivalent to the source program.
The target file is composed of segments. Usually there are at least two sections in an object file:
Code section This section contains mainly program instructions. This section is generally readable and executable, but generally not writable.
The data segment mainly stores various global variables or static data to be used in the program. Generally, data segments are readable, writable, and executable.
There are three main types of object files in the UNIX environment:
(1) Relocatable files contain code and data suitable for linking other object files to create an executable or shared object file.
(2) Shared object file This file stores code and data suitable for linking in two contexts. The first is that the linker can process it with other relocatable files and shared object files to create another object file; the second is that the dynamic linker uses it with another executable file and other shared object files Combine them together to create a process image.
(3) Executable file It contains a file that can be executed by a process created by the operating system.
What the assembler generates is actually the first type of object file. For the latter two, some other processing is needed. This is the work of the linker.
5. Link program
The object file generated by the assembler cannot be executed immediately, and there may be many unsolved problems. For example, a function in a source file may reference a symbol defined in another source file (such as a variable or function call, etc.); a function in a library file may be called in a program, and so on. All these problems can only be solved by the processing of the link program.
The main job of the linker is to connect related object files to each other, that is, to connect the symbols referenced in one file with the definition of the symbol in another file, so that all these object files become an operating system installation. Into the unified whole of execution.
According to the different linking methods of functions in the same library specified by the developer, the linking process can be divided into two types:
(1) Static linking In this linking mode, the code of the function will be copied from the static link library where it is located to the final Executable program. In this way, these codes will be loaded into the virtual address space of the process when the program is executed. The static link library is actually a collection of object files, each of which contains one or a group of related function codes in the library.
(2) Dynamic link In this way, the code of the function is placed in an object file called a dynamic link library or shared object. What the linker does at this time is to record the name of the shared object and a small amount of other registration information in the final executable program. When this executable file is executed, the entire content of the dynamic link library will be mapped to the virtual address space of the corresponding process at runtime. The dynamic link program will find the corresponding function code according to the information recorded in the executable program.
For function calls in executable files, dynamic linking or static linking methods can be used respectively. Using dynamic linking can make the final executable file shorter, and save some memory when the shared object is used by multiple processes, because only one copy of the shared object's code needs to be stored in the memory. But it's not that dynamic linking is necessarily better than static linking. In some cases, dynamic linking may bring some performance damage.
Makefile compilation
makefile是用于自动编译和链接的,一个工程有很多文件组成,每一个文件的改变都会导致工程的重新链接-----
但是不是所有的文件都需要重新编译,makefile能够纪录文件的信息,决定在链接的时候需要重新编译哪些文件!
在unix系统下,makefile是与make命令配合使用的。
|