The process from code to executable file

The general process can be divided into the following 4 steps:

Preprocessing-> Compilation-> Assembly-> Link

1. Pretreatment

Handle pre-compiled instructions starting with "#" in the source code file.

1) Expand macro definition

2) Handle all conditional compilation instructions

3) Handle #include precompiled instructions and recursively insert the included files into it

4) Delete the comment

5) Add line number and file name identification, so that the line number can be displayed when the compiler generates errors and warnings

6) Retain #pragma compiler directive

Eventually generate .i file (c language), or .ii file (C ++ language).

Second, compile

1) Scan, input the source code to the scanner.

1) Lexical analysis, the character sequence of the source code is divided into symbols: keywords, identifiers, literals, special symbols, such as "(" is a symbol, variable i is a symbol. Lexical scanning tool: lex

2) Syntax analysis, generate a syntax tree whose nodes are expressions, and at the same time check whether the expression is legal, such as mismatched parentheses and missing operators can be checked at this step. Grammar analysis tool: yacc

3) Semantic analysis, check whether the expression is meaningful, such as type matching, division by 0. Mark all nodes in the syntax tree with types.

4) Source code optimization, such as 2 * 5, can be determined during this period, directly replace the expression with 10. In this step, the syntax tree will generate intermediate code (excluding data size, variable address, register name).

5) Target code generation, generating target machine code from intermediate code. The intermediate code has nothing to do with the machine, the target machine code is related to the machine, and the word length and register name will be marked.

6) Object code optimization, replacing multiplication with displacement, deleting redundant instructions, etc.

The assembly code file will eventually be generated.

3. Assembly

Convert exchange rate codes into instructions that the machine can execute. It is simply translated one by one according to the comparison table of assembly instructions and machine instructions, without instruction optimization.

Assembler tool: as

Fourth, the link

The link is mainly to deal with the parts of each module that refer to each other, so that they can be connected correctly. For example, the A module references a function of the B module and a global variable of the C module.

1) Address and space allocation

2) Symbol resolution (also called address binding)

3) Relocation. Some global variables or functions are defined in other libraries and relocated to find their addresses.

Published 181 original articles · Like 13 · Visitors 10,000+

Guess you like

Origin blog.csdn.net/qq_43461641/article/details/105074671