Compilation process of C/C++ application

1. The stages of C/C++ language generation from source code are as follows

Source program->compile preprocessing->compile->optimizer->assembler->linker->executable file

Among them, the preprocessing stage is compiled, the source program is read, and the pseudo-instructions (instructions beginning with #) and special symbols are processed. In other words, it scans the source code, performs preliminary conversion on it, and generates new source code for the compiler. The preprocessing process processes the source code before the compiler. The following uses VC compilation as an example, some content is not supported in gcc

preprocessing stage

Although preprocessors are included in the vast majority of compilers today, they are generally considered compiler-independent. The preprocessing process reads the source code, examines statements and macro definitions that contain preprocessing directives, and transforms the source code in response. The preprocessing process also removes comments and extra whitespace from the program.

1. Definition of pseudo-instructions (or preprocessing instructions)

A preprocessing directive is a line of code that begins with a # sign . The # sign must be the first character on the line excluding any whitespace characters. After the # is the instruction keyword, any number of whitespace characters are allowed between the keyword and the # sign. An entire line of statements constitutes a preprocessing directive that will do certain transformations to the source code before the compiler compiles it.

Second, the preprocessing instructions mainly include the following four aspects:

1. Macro definition instruction #define

A macro defines an identifier that represents something specific. The preprocessing process replaces the macro identifiers that appear in the source code with the value of the macro definition . The most common use of a macro definition is to define a global symbol that represents a value . The second use of macro is to define a macro with parameters (macro function) , such a macro can be called like a function, but it expands the macro at the calling statement and replaces the formal parameters in the definition with the actual parameters at the time of the call .

Usage one:

#definePI 3.1415926

Notice:

(1) As a convention, it is customary to always define macros with all capital letters , so that it is easy to distinguish the macro identifier of the program from the general variable identifier. The benefits of using macros are:

One is easy to use.

Second, the defined macros are meaningful and readable.

The third is easy to modify.

(2) The value represented by the macro can be a constant expression, allowing macro nesting (must be defined before). E.g:

#defineONE 1

#defineTWO 2

#defineSUM (ONE+TWO)

Note the use of parentheses here, although they are not required. However, parentheses should be added as a precaution. Preprocessing is just simple character replacement and does not handle precedence.

(3) A macro can also represent a string constant, for example:

#defineVERSION"Version 1.0"

(4) #define instruction with parameters (macro function)

Macros and function calls with arguments look somewhat similar. See an example:

#defineSUM(x,y) (x+y)

Any numeric expression or even a function call can be used in place of the parameters x, y. Here again I remind you to pay attention to the use of parentheses. The macro expansion is completely contained in a pair of parentheses, and the parameters are also contained in parentheses, thus ensuring the integrity of the macro and parameters. See a usage:

sum = SUM (2, 3); expands to sum = (2 + 2);

1.2 #Operator

出现在宏定义中的#运算符把跟在其后的参数转换成一个字符串。有时把这种用法的#称为字符串化运算符。例如：

宏定义中的#运算符告诉预处理程序，把源代码中任何传递给该宏的参数转换成一个字符串。所以输出应该是12345。

1.3 ##运算符（很少用）

##运算符用于把参数连接到一起。预处理程序把出现在##两侧的参数合并成一个符号。看下面的例子：

2、条件编译指令。

程序员可以通过定义不同的宏来决定编译程序对哪些代码进行处理。条件编译指令将决定那些代码被编译，而哪些是不被编译的。可以根据表达式的值或者某个特定的宏是否被定义来确定编译条件。这些指令包括：#if/#ifdef/#ifndef/#else/#elif/#endif

#if指令检测跟在制造另关键字后的常量表达式。如果表达式为真，则编译后面的代码，直到出现#else、#elif或#endif为止；否则就不编译。

#endif用于终止#if预处理指令。

#else指令用于某个#if指令之后，当前面的#if指令的条件不为真时，就编译#else后面的代码。

#elif预处理指令综合了#else和#if指令的作用。

#ifdef和#ifndef这二者主要用于防止重复包含。我们一般在.h头文件前面加上这么一段：

#ifndef FUNCA_H

#define FUNCA_H

//头文件内容

#endif

这样，如果a.h包含了funcA.h，b.h包含了a.h、funcA.h，重复包含，会出现一些type redefination之类的错误。

3、特殊符号。

预编译程序可以识别一些特殊的符号。预编译程序对于在源程序中出现的这些串将用合适的值进行替换。

__FILE__ 包含当前程序文件名的字符串

__LINE__ 表示当前行号的整数

__DATE__ 包含当前日期的字符串

__STDC__ 如果编译器遵循ANSI C标准，它就是个非零值

__TIME__ 包含当前时间的字符串

注意：是双下划线，而不是单下划线。

#error指令将使编译器显示一条错误信息，然后停止编译。

#line指令改变_LINE_与_FILE_的内容，它们是在编译程序中预先定义的标识符。

#pragma指令没有正式的定义。编译器可以自定义其用途。典型的用法是禁止或允许某些烦人的警告信息。

4、头文件包含指令。

这是最常见的。采用头文件的目的主要是为了使某些定义可以供多个不同的C源程序使用。因为在需要用到这些定义的C源程序中，只需加上一条#include语句即可，而不必再在此文件中将这些定义重复一遍。预编译程序将把头文件中的定义统统都加入到它所产生的输出文件中，以供编译程序对之进行处理。

#include预处理指令的作用是在指令处展开被包含的文件。包含可以是多重的，也就是说一个被包含的文件中还可以包含其他文件。标准C编译器至少支持八重嵌套包含。预处理过程不检查在转换单元中是否已经包含了某个文件并阻止对它的多次包含，这个的处理办法使用上面给出的条件预处理指令。

include文件的展开是一个很简单的过程，只是将include文件包含的代码拷贝到包含当前cpp文件中。

（1）没有被任何的其它cpp文件或者头文件包含的.h文件将不会被编译。也不会最终成为应用程序的一部分。

编译C++工程后你会发现，并没有报告上面的代码错误。这说明.h文件本身不是一个编译单元。只有通过include语句最终包括到了一个.cpp文件中后才会成为一个编译单元。

（2）存在一种可能性，即一个cpp文件直接的或者间接的包括了多次同一个.h文件。

上面这样的多重包含就出现编译错误

（3）include文件是按照定义顺序被展开到cpp文件中的。

编译和链接。

C++的编译实际上分为编译和链接两个阶段，这两个阶段联系紧密。根据C++标准，一个编译单元（Translation Unit）是指一个.cpp文件以及它所include的所有.h文件，.h文件里面的代码将会被扩展到.cpp文件里，然后编译器编译该.cpp文件生成一个.obj文件。obj文件拥有PE[Portable Executable,即windows可执行文件]文件格式，并且本身包含的就已经是二进制码，但是，不一定能够执行。当编译器将一个工程里的所有.cpp文件都编译完毕后，再由链接器进行链接，成为一个.exe或库文件。

编译上面的项目，VS会生成如下文件

生成的目标文件为可重定位文件（Relocatable File）

这里有个问题，虽然test.h对main.cpp是可见的（main.cpp包含了test.h），但是test.cpp对main.cpp并不可见，那么main.cpp是如何找到func函数的实现的呢？

实际上，在单独编译main.cpp文件的时候编译器并不先去关注func函数是否已经实现，或者在哪里实现。它只是把它看作一个外部的链接类型，认为func函数的实现应该在另外的一个obj文件中。在调用func的时候，编译器仅仅使用了一个地址跳转，但是由于并不知道foo具体存在于哪个地方，因此只是在jump后面填入了一个假的地址。然后就继续编译下面的代码。当所有的cpp文件都执行完了之后就进入链接阶段。

Compilation process of C/C++ application

Guess you like