Compiler overview - structure and main components

scanning: It is to divide the input into tokens, which can be understood as the analysis of each word
parsing: also called syntactic analysis to analyze the entire sentence
semantic analysis: check whether the generated sentence is meaningful, sometimes a sentence is grammatically correct but meaningless, such as Apple ate a car

A program can only be meaningful if it has key features such as expressions, conditional judgment loops, etc., and then converted into assembly language

For example, variable a is called symbolic information (symbolic information). All the variables that do not have symbolic information in the final converted assembly language are actually memory addresses.

After the final Abstract syntax tree is formed, the symbolic information is not retained,
and the only function of retaining symbolic information is to debug

scanner:

 paring:

semantic:

student a;

car b;

int x = a + b; 

For example, if the above three lines of code do not define some addition rules inside the class, then int x = a + b; there is no problem in grammar, but the logic is unreasonable and meaningless 

Optimization (optimization):
common subexpression elimination (CSE) common subexpression elimination

It is not to say that all compiler optimizations can guarantee to increase code speed. It can only be said that optimization is likely to improve code speed. For example, the above optimization will increase the burden of code running on CPU because t actually only maintains the relationship between two variables. Adding but occupying one more register position, it is better to directly calculate a+b twice. If many variables are maintained in t, this optimization is obviously useful, so the optimization in the compiler is undoubtedly not intelligent optimization in the absolute sense.

Although it seems that the process of optimization has nothing to do with the internal design of the cpu, it is not the absolute number of registers. The design logic inside the cpu will actually affect the process of optimizing the design.

Look at an example x = a*b + c*d; its AST is as follows:

The corresponding abstract assembly instruction is:
load a to r1

load b to r2

mult r1 r2 to r3

load c to r4

load d to r5

mult r4 r5 to r6

add r3 r6 to r7

store r7 to x

Although this abstract assembly instruction is basically the operation logic inside the cpu, it is obvious that there cannot be infinitely many registers. Registers like r1 are called virtual registers (virtual registers) from register allocation (register allocation) to real physical registers. The instruction selection above will select these abstract instructions and finally reflect them on the real cpu and memory, while the instruction scheduling will arrange the abstract assembly instructions reasonably to maximize the performance efficiency of the final program. 

Guess you like

Origin blog.csdn.net/weixin_43754049/article/details/126163311