Compilation principle and technology-knowledge points finishing (for personal use)

Concept: The process of converting a source program into an equivalent target program is called compilation. The compiler is one of the basic components of modern computer systems.

Compilation Principle Exam Version-Hunzi Crash Course End-of-Term Guarantee: https://www.bilibili.com/video/BV1ft4y1X7p6?p=1 (The voice of my lady is also good, haha) If you don’t want to watch it, you can just watch the video (pure topic explanation) )

Chapter One Compilation Overview

1.1 Translation procedure:

Compiler

Assembler: assembly language -> machine language

Compiler: high-level language -> assembly language | machine language

Interpreter

The interpreter is to obtain, analyze and execute the source program statements one by one. Once the analysis of a statement is completed, the interpreter will start to run and generate results. The debugging of the language program can be realized by the interpreter

1.2 Stages and tasks of compilation:

Analysis stage (lexical analysis, grammatical analysis, semantic analysis)
Synthesis stage (intermediate code generation, code optimization, code generation)

1.3 Other concepts related to compilation

Front end: It is mainly composed of those parts that are related to the metalanguage but not related to the target machine, usually including lexical analysis, syntax analysis, semantic analysis and intermediate code generation, symbol table establishment, and machine-independent code optimization work, as well as corresponding Error handling and symbol table operations.

Back-end: It is composed of the parts related to the target machine in the compiler. Generally speaking, these parts have nothing to do with the source language but only rely on the intermediate language, including the generation of target code, machine-related code optimization, and corresponding error handling And symbol table operations.

The concept of "pass": a "pass" refers to scanning the source program or its intermediate representation from beginning to end, and doing related processing to generate a new intermediate representation or target program. Each pass completes the work of one or several logical parts. eg: Perform lexical analysis in the first pass, grammatical analysis in the second pass, and semantic analysis in the third pass.

Preprocessor/assembler: omitted.

Chapter 2 Formal Language Language and Automata Foundation

2.1 Alphabet and symbol strings

Alphabet: A non-empty finite set of symbols. Typical symbols are letters, numbers, various punctuation and operators, etc. eg: {0, 1} is the alphabet of binary numbers. The alphabet used by computers is commonly composed of ASCII character set and EBCDIC character set.

Symbol string: The symbol string defined on a certain alphabet is a finite symbol sequence composed of the symbols in the alphabet. eg: 010011, 0101, etc. are the symbol strings on the alphabet {0, 1}. aa, ab, abababab, etc. are symbol strings defined on the alphabet {a, b}.

Symbol string length: the number of symbols in the string.

2.2 Language

Language definition: a collection of symbol strings on a certain alphabet. ----Language can also perform operations. eg: union, intersection, closure operations, etc.

2.3 Grammar

Grammar definition: Grammar is the formal rules describing the grammatical structure of a language. Any grammar can be represented as a four-tuple = G ( $V_{T}$ , , $V_ {N}$ S, $\varphi$ )

$V_{T}$ It is a non-empty finite set, and each element is called a terminal symbol.

$V_ {N}$ It is a non-empty finite set, and each element is called a non-terminal symbol.

S is a special non-terminal symbol called the start symbol of the grammar. The start symbol S must appear at least once to the left of a certain production.

$\varphi$ Is a non-empty finite set, and each element is called a production.

Grammar classification:

Type 0 grammar:

Type 1 grammar: Context-sensitive grammar.

Type 2 grammar: Context-free grammar.

Type 3 grammar: regular grammar (linear grammar)

Grammar writing convention:

Often used as terminator (lowercase letters in the first order, eg: abc, etc., operation symbols, eg: +-*/, etc., punctuation, eg: brackets, commas, colons, equal signs, etc., numbers, etc.)

Often used as non-terminal symbols (uppercase letters ABCD, S, lowercase italics)

2.4.1 Elimination of grammatical ambiguity

Two ways to eliminate grammatical ambiguity :
① rewrite the
ambiguity grammar into a non- ambiguity grammar; ② stipulate the priority and associativity of the symbols in the ambiguity grammar, so that only one parse tree is generated.

Left associativity:
For A→αAβ, if A(the non-terminal symbol on both left and right sides) appears on the left side of the terminal symbol (that is, the terminal symbol is in β), then the A production has a left-associative property.
Such as:

E → E + TIt is left associative ( Ethat is A, it +is a terminal symbol).

F → (E) | -FIs right associative ( Fthat is A, -is a terminal symbol)

2.4.2 Left recursive elimination. (Omitted)-Later supplement

Purpose: Since the top-down parsing method cannot handle left-recursive grammar, it is necessary to eliminate left-recursive grammar.

Example:

2.5 Finite automata (omitted)-later supplement

Definition: Finite automata is a mathematical model of a system with discrete inputs and outputs.

Focus: state transition diagram.

Chapter Three, Lexical Analysis

Lexical analysis is mainly to analyze the input source program (string), and output all the legal words appearing in the string. For example: int a = 3 + 5; After lexical analysis, it will output int, a, =, 3, +, 5 and; these seven words. The official way to implement a lexical analyzer is:
    1. Write the regular expression of each word (regular expression);
    2. Construct NFA (uncertain finite automata) according to the regular expression;
    3. Convert NFA to DFA (deterministic limited Automata);
    4. According to DFA, you can implement a lexical analyzer and write a program.

Chapter Four, Grammatical Analysis

4.1 Commonly used syntax analysis

Grammatical analysis task: Identify various grammatical components from the sequence of the source program numbers according to the grammar, and perform grammatical inspection at the same time to prepare for semantic analysis and code generation.

Top-down analysis method: build an analysis tree for the input token sequence from the top (root) to the bottom (leaf). eg: The predictive analysis program uses a top-down analysis method.

Bottom-up analysis method: build an analysis tree for the input token sequence from bottom to top. eg: The LR analysis program uses a bottom-up analysis method.

4.2 Handling of grammatical errors

Lexical errors: illegal symbols, etc.

Syntax error: mismatched parentheses, missing operands, etc.

Semantic errors: incompatible types of operations, mismatches in actual participation parameters, etc.

Logical error: endless recursive calls, etc.

4.3 Top-down analysis method-later supplement

Recursive descent analysis (theoretically established): Try to find a leftmost derivation sequence in order to derive the input symbol string through step-by-step derivation.

Recursive call prediction analysis: suitable for LL(1) type grammar. In the recursive descent analysis,

First eliminate the left recursion of the grammar-then find the first set and follow set of the grammar, and determine whether it is an LL1 grammar. -

This is just a preparation condition (to judge whether it can be an LL1 grammar.

Use the state transition diagram to identify the input symbol string. (Simplification of state transition diagram of predictive analysis program)

Non-recursive predictive analysis: It is necessary to construct a predictive analysis table, and select the production formula according to the predictive analysis table.

The construction process of the forecast analysis table: (omitted)

example:

The analysis process of the symbol string according to the predictive analysis table:

4.4 Bottom-up analysis method (no need to eliminate left recursion)-the key is to find the "reducible string", and then determine which non-terminal symbol to reduce it according to the rules-(canonical reduction method = key point = find a handle)

Process: Try to construct a parse tree for the input string from the bottom up, starting from the leaves and up to the root. Scan from left to right. Starting from the input string, search for the "reducible string" of the current sentence pattern, and use rules to reduce it to the corresponding non-terminal symbol to get a new sentence pattern, and repeat the process. Until the final reduction to the beginning of the grammar symbol. (Similar to using stacks to realize expression calculation-infix, suffix conversion and other algorithms)

LR analysis method (context-free grammar):

character meaning

L Indicates that the input string is scanned from left to right

R Represents the use of the rightmost analysis method to identify sentences, that is, constructing a reverse process of the rightmost derivation

k Indicates to view the number of input string symbols to the right

Part: input, output, stack, analysis control program, analysis table.

The structure of the analysis table:

First, construct a certain finite automaton that recognizes all the live prefixes of a point grammar, and then construct the analysis table of the grammar based on this finite automaton.

Note: The left side of the dot indicates the content already in the stack. The right side of the dot indicates the content that has not been pushed onto the stack. (This way of writing represents the current state.)

Analysis process:

character	meaning
L	Indicates that the input string is scanned from left to right
R	Represents the use of the rightmost analysis method to identify sentences, that is, constructing a reverse process of the rightmost derivation
k	Indicates to view the number of input string symbols to the right

Chapter 5, Grammar Translation

5.1 Task + Step

Convert the source program into an equivalent target program, that is, the target program must have the same semantics as the source program.

Steps: First, it is necessary to confirm the semantics contained in each production according to the requirements of the translation target, then analyze the semantics of each rich man in the grammar, and attach these semantics to the corresponding grammatical symbols in the form of attributes (the semantics and language structure Then, according to the semantics of the production, the job search rules (ie, semantic rules) of symbol attributes are given to form a grammatical guidance definition. In this way, when the production is used in the grammatical analysis process, the attribute can be evaluated according to the corresponding semantic rules to complete the translation. (The translation goal determines the meaning of the production, the attributes that grammatical symbols should have, and the semantic rules that produce yes)

Task: Perform grammatical analysis on the word symbol string, construct a grammatical analysis tree, then construct an attribute dependency graph as needed, traverse the grammar tree, and perform calculations according to semantic rules at each node of the grammar tree.

How to represent semantic information?

Set semantic attributes for the grammar symbols in CFG to represent the semantic information corresponding to the grammatical components .

For example, for a variable, its semantic attributes can have the value of the variable, the type of the variable, and so on.

How to calculate semantic attributes?

The semantic attribute value of a grammar symbol is calculated using the semantic rules associated with the production (ie grammar rule) where the grammar symbol is located .

For a given input string x, the constructed xgrammatical analysis tree uses the semantic rules associated with the production (grammatical rules) to calculate the semantic attribute value corresponding to each node in the analysis tree .

Syntax-Directed Definitions (SDD)

SDD is a promotion of Context Free Grammar (CFG)

Associate each grammar symbol with a set of semantic attributes

Associate each production with a set of semantic rules , which are used to calculate the attribute value of each grammar symbol in the production.

If Xa grammar symbol, ais Xa property, use X.aindicate an attribute ain a designated Xvalue of the parse tree node. (X can be terminal or non-terminal)

Grammar-guided definition: Corresponding to each grammatical production A->a, there is a set of semantic rules associated with it, and its form is b=f(c, d,...) where f is a function, and -1) If b is a comprehensive attribute of A , then c, d.. are an inherited attribute of the grammatical symbol on the right of the production . -2) If b is an inherited attribute of a grammar symbol on the right of the production, then cd is an attribute of A or any grammar symbol on the right of the production.

Comprehensive attributes: In the analysis tree, the comprehensive attributes of a node are determined by the attribute values of its child nodes. Used to convey information from the bottom up

Inherited attribute: the inherited attribute value of a grammar symbol on the right of the production is determined by the inherited attribute of the left symbol and/or the attribute value of any grammar symbol on the right. Used to convey information from top to bottom. (Similar to the value of a variable)

1) Terminal symbols have only comprehensive attributes, which are provided by the lexical analyzer. Non-terminal symbols can have comprehensive attributes or inherited attributes. All inherited attributes of the grammar start symbol are used as the initial value before attribute calculation. (Similar to the type of variable)

5.2 Definition of Semantic Guidance-an extended form of context-free grammar

5.2.1 Dependency graph-omitted

5.2.2 Grammar Spanning Tree-Emphasis-Refer to the reduction construction of LR grammar analysis.

Chapter 6, Semantic Analysis

6.1 Task

Link the definition of variables with their respective uses, check the meaning of the sentence, and check whether each grammatical component has the correct semantics.

Task: Conduct context-sensitive nature review and type review of source programs that are structurally correct. Semantic analysis is to examine the source program for semantic errors and collect type information for the code generation stage.

6.2 Symbol table

The symbol table is generated during the first traversal and lexical analysis.

The content of the symbol table (variable name, data type, target address, dimension or number of parameters, declaration line, reference line, chain domain)

6.3 Symbol table operation

The most commonly used operations are insertion and retrieval. These operations are slightly different depending on whether the compiled source language has explicit declarations. The symbol table retrieval operation will be performed for all variable references, and the retrieved information (such as type, target address and dimension, etc.) will be used for type checking and code generation. The retrieval operation performed at this stage can check out undefined And use the variable, and give the corresponding error or warning message.

Chapter 8, Intermediate Code Generation

8.1 Intermediate code form (suffix representation, syntax tree, dag, three-address code)

Task: The task of the intermediate code generation program is to translate the intermediate representation of the source program obtained through the analysis phase into an intermediate code representation. It is convenient for the establishment and transplantation of compilers, and is convenient for independent and machine code optimization.

This chapter uses quaternion representation (examination requirements)

example:

Compilation principles and techniques-review of knowledge points (for personal use)