Through practical examples, introduce the working process of the compiler

【Abstract】 This paper introduces the following picture in detail. A compiler is a tool that translates high-level language code into machine language code. The work of the compiler can be divided into several important stages, the following are some common stages, and specific examples are given: Lexical Analysis: In the lexical analysis stage, the compiler decomposes the source code into lexical units ( Token) sequence. A lexical unit is the smallest unit with grammatical meaning, such as identifiers, keywords, operators, constants, etc. Compilers use techniques such as regular expressions and finite automata to scan the source code and generate lexical...

This article is shared from Huawei Cloud Community " Introducing the Working Process of the Compiler Through Practical Examples ", author:   Jerry Wang

This article describes the figure below in detail.

insert image description here

A compiler is a tool that translates high-level language code into machine language code. The work of the compiler can be divided into several important phases, the following are some of the common phases, and specific examples are given:

  1. Lexical Analysis:
    In the lexical analysis phase, the compiler decomposes the source code into a sequence of tokens. A lexical unit is the smallest unit with grammatical meaning, such as identifiers, keywords, operators, constants, etc. Compilers use techniques such as regular expressions and finite automata to scan the source code and generate sequences of tokens.

    For example, for the following snippet of C code:

    int x = 5 + 3;
    

    The lexical analyzer will break it down into the following sequence of tokens:

    <int> <x> <=> <5> <+> <3> <;>
    
  2. Syntax Analysis (Syntax Analysis):
    In the syntax analysis stage, the compiler builds an abstract syntax tree (Abstract Syntax Tree, AST) based on the sequence of lexical units. The abstract syntax tree represents the structure and grammatical relationship of the program, which reflects the expressions, statements and grammatical rules in the code. The compiler uses context-free grammar and parsing algorithms (such as LL(1), LR(1), etc.) for parsing.

    Taking the same C language code as an example, the parser will construct the following abstract syntax tree according to the sequence of lexical units:

           =
          / \
         /   \
       int    +
            / \
           5   3
    
  3. Semantic Analysis:
    In the semantic analysis phase, the compiler checks the code for semantic errors and type consistency. It manages the symbol table to ensure the correct use of variables, functions and types. The compiler also performs optimizations such as type deduction, type conversion, and constant folding to ensure the semantic correctness of the code.

    For example, in C, for the following code snippet:

    int x = 5;
    char y = x;
    

    The semantic analyzer will detect a type mismatch error in an assignment statement because an integer value is assigned to a character variable.

  4. Intermediate Code Generation:
    In the intermediate code generation phase, the compiler converts the abstract syntax tree into an intermediate representation (Intermediate Representation, IR) or intermediate code. The intermediate code is a high-level code that has nothing to do with the machine, and is usually expressed in the form of three-address code and four-address code. The intermediate code serves as a bridge connecting the front and back ends, which is convenient for optimization and target code generation.

    by

Taking the previous C language code as an example, the intermediate code generator can generate the following three-address code representation:

t1 = 5
t2 = 3
t3 = t1 + t2
x = t3

A behavior that runs through the work of the compiler is the management of the symbol table. The symbol table is a data structure maintained by the compiler during lexical analysis, syntax analysis, and semantic analysis. It is used to store information about identifiers (such as variables and function names) and their attributes (such as types and scopes, etc.). The management of the symbol table is to ensure that identifiers in the code are correctly declared and used, and to provide the information needed for semantic analysis and subsequent stages.

In the above-mentioned stages, the compiler will use the symbol table to perform operations such as declaration, reference and analysis of identifiers, as well as type checking and scope determination. The management of the symbol table enables the compiler to perform correct semantic analysis on the code, and generate correct intermediate code and object code in subsequent stages.

 

 

Guess you like

Origin blog.csdn.net/qq_48892708/article/details/131113503