C language compilation process-header file processing

Header file processing

 

Compiler, the compiler reads the source program (character stream), analyzes the lexical and grammar, converts high-level language instructions into functionally equivalent assembly code, and then converts the assembler into machine language. Executing the file format requires linking to generate an executable program.
C source program header file -> pre-compiled processing (cpp) -> compiler itself--> optimized program--> assembler--> linker -> executable file
1. Compile preprocessing
read c source Program, which processes pseudo-instructions (instructions beginning with #) and special symbols. The
pseudo-instructions mainly include the following four aspects
(1) Macro definition instructions, such as #define Name TokenString, #undef, etc. For the previous pseudo-instruction, what pre-compilation needs to do is to replace all the Names in the program with TokenString, but the Name as a string constant is not replaced. For the latter, the definition of a certain macro will be cancelled, so that the occurrence of the string will no longer be replaced.

(2) Conditional compilation instructions, such as #ifdef, #ifndef, #else, #elif, #endif, etc. The introduction of these pseudo-instructions allows programmers to decide which codes the compiler will process by defining different macros. The pre-compiler will filter out those unnecessary codes according to the relevant files

(3) The header file contains instructions, such as #include "FileName" or #include <FileName>, etc. A large number of macros (the most common is character constants) are generally defined by the pseudo-instruction #define in the header file, and they also contain declarations of various external symbols. The main purpose of using header files is to make certain definitions available to multiple different C source programs. Because in the C source program that needs to use these definitions, you only need to add a #include statement instead of repeating these definitions in this file. The pre-compiler will add all the definitions in the header file to the output file it produces for the compiler to process.

The header files included in the c source program can be provided by the system, and these header files are generally placed in the /usr/include directory. #Include them in the program to use angle brackets (< >). In addition, developers can also define their own header files. These files are generally placed in the same directory as the c source program. At this time, double quotation marks ("") should be used in #include.

(4) Special symbols, the pre-compiler can recognize some special symbols. For example, the LINE identifier that appears in the source program will be interpreted as the current line number (decimal number), and FILE will be interpreted as the name of the C source program currently being compiled. The precompiled program will replace these strings appearing in the source program with appropriate values.


What the pre-compiled program accomplishes is basically the "replacement" of the source program. After this substitution, an output file with no macro definitions, no conditional compilation instructions, and no special symbols is generated. The meaning of this file is the same as the source file without preprocessing, but the content is different. In the next step, this output file will be translated into machine instructions as the output of the compiler.
2. Compilation stage

The output file obtained by pre-compilation will only have constants. Such as the definition of numbers, strings, variables, and C language keywords, such as main, if, else, for, while, {,}, +,-,*,/, etc. What the precompiler needs to do is to pass lexical analysis and grammatical analysis. After confirming that all instructions comply with the grammatical rules, they are translated into equivalent intermediate code representations or assembly codes.
3. Optimization stage
Optimization processing is a relatively difficult technology in the compilation system. The problems it involves are not only related to the compilation technology itself, but also have a great relationship with the hardware environment of the machine. The optimization part is the optimization of the intermediate code. This optimization does not depend on the specific computer. The other optimization is mainly for the generation of target code. In the figure above, we put the optimization stage behind the compiler, which is a more general representation.

For the former optimization, the main work is to delete common expressions, loop optimization (code outsourcing, strength weakening, changing loop control conditions, merging of known quantities, etc.), copy propagation, and delete useless assignments, etc.

The latter type of optimization is closely related to the hardware structure of the machine. The most important consideration is how to make full use of the values ​​of the variables stored in the hardware registers of the machine to reduce the number of memory accesses. In addition, how to adjust instructions according to the characteristics of machine hardware execution instructions (such as pipeline, RISC, CISC, VLIW, etc.) to make the target code shorter and the execution efficiency higher is also an important research topic.

The optimized assembly code must be assembled by the assembler and converted into corresponding machine instructions before it can be executed by the machine.
4. Assembly process The

assembly process actually refers to the process of translating assembly language codes into target machine instructions. For each C language source program processed by the translation system, the corresponding target file will be finally obtained through this process. What is stored in the target file is the machine language code of the target equivalent to the source program.

The target file is composed of segments. Usually there are at least two sections in an object file:

Code section This section contains mainly program instructions. This section is generally readable and executable, but generally not writable. 

The data segment mainly stores various global variables or static data to be used in the program. Generally, data segments are readable, writable, and executable. 

There are three main types of object files in the UNIX environment:

(1) Relocatable files contain code and data suitable for linking other object files to create an executable or shared object file.

(2) Shared object file This file stores code and data suitable for linking in two contexts. The first is that the linker can process it with other relocatable files and shared object files to create another object file; the second is that the dynamic linker uses it with another executable file and other shared object files Combine them together to create a process image.

(3) Executable file It contains a file that can be executed by a process created by the operating system.

What the assembler generates is actually the first type of object file. For the latter two, some other processing is needed. This is the work of the linker.

5. Link program

The object file generated by the assembler cannot be executed immediately, and there may be many unsolved problems. For example, a function in a source file may reference a symbol defined in another source file (such as a variable or function call, etc.); a function in a library file may be called in a program, and so on. All these problems can only be solved by the processing of the link program.

The main job of the linker is to connect related object files to each other, that is, to connect the symbols referenced in one file with the definition of the symbol in another file, so that all these object files become an operating system installation. Into the unified whole of execution.

According to the different linking methods of functions in the same library specified by the developer, the linking process can be divided into two types:

(1) Static linking In this linking mode, the code of the function will be copied from the static link library where it is located to the final Executable program. In this way, these codes will be loaded into the virtual address space of the process when the program is executed. The static link library is actually a collection of object files, each of which contains one or a group of related function codes in the library.

(2) Dynamic link In this way, the code of the function is placed in an object file called a dynamic link library or shared object. What the linker does at this time is to record the name of the shared object and a small amount of other registration information in the final executable program. When this executable file is executed, the entire content of the dynamic link library will be mapped to the virtual address space of the corresponding process at runtime. The dynamic link program will find the corresponding function code according to the information recorded in the executable program.

For function calls in executable files, dynamic linking or static linking methods can be used respectively. Using dynamic linking can make the final executable file shorter, and save some memory when the shared object is used by multiple processes, because only one copy of the shared object's code needs to be stored in the memory. But it's not that dynamic linking is necessarily better than static linking. In some cases, dynamic linking may bring some performance damage.



Makefile compilation

makefile是用于自动编译和链接的,一个工程有很多文件组成,每一个文件的改变都会导致工程的重新链接-----

        但是不是所有的文件都需要重新编译,makefile能够纪录文件的信息,决定在链接的时候需要重新编译哪些文件!

        

        在unix系统下,makefile是与make命令配合使用的。
 
举个例子吧,我现在有main.c 、window.c 、model.c 、data.c 4个.c文件和window.h 、model.h 、data.h 3个.h文件。

                    main.c是主程序,里面有main()函数。其他的都是模块。

                    

                    如果要生成最终的可执行文件,要做以下步骤:

                    1、分别编译window.c 、model.c 、data.c 、main.c ,将会得到3个目标文件:window.o 、model.o 、data.o 、main.o

                    2、把这4个.o (在windows下就是.obj)文件链接起来,得到main.out(在windows下就是main.exe)。

                    

                    那么这些文件就要有逻辑关系,否则编译器不知道怎么编译。

                    

                    all:main.out

                    main.out:main.o window.o model.o data.o

                    gcc -o main.out main.o window.o model.o data.o

                    

                    #上面的意思是说:

                    #all:main.out

                    如果想要编译所有:make all,那么将会生成main.out可执行文件。

                    

                    #main.out:main.o window.o model.o data.o

                    而要生成这个main.out,需要依赖main.o,window.o,model.o,data.o 4个文件。

                    

                    #    gcc -o main.out main.o window.o model.o data.o

                    这句是调用编译器编译,vc用的是cl。变异的时候可以加上很多的参数、定义的宏、链接库路径等。

                    

                    当然,还没有完呢,这些main.out依赖的这些 .o 怎么来的?

                    

                    window.o:window.c window.h

                    gcc -c window.c

                    

                    model.o:model.c model.h

                    gcc -c model.c

                    

                    data.o:data.c data.h

                    gcc -c data.c

                    

                    上面的-c参数是指定编译器编译出一个.o文件就可以了,不要再寻找main()函数做链接工作。

                    

                    这些和到一起,就是一个makefile,当然这些功能还太少,可以加上很多别的项目。但宗旨就是:

                    让编译器知道要编译一个文件需要依赖其他的哪些文件。当那些依赖文件有了改变,编译器会自动的发现最终的生成文件已经过时,

                    而重新编译相应的模块。

                    

                    现在的VC++真是太好了,不用一个字一个字的去敲Makefile 了。

Guess you like

Origin blog.csdn.net/geggegeda/article/details/4205977