Summary of compilation principles
- mind Mapping
- Introduction
- lexical analysis
-
-
-
- Input and output of word classification and lexical analysis
- Word formation rules
- The concepts of grammar and language and their mutual derivation
- The leftmost and rightmost derivation, find the syntax tree
- Ambiguity
- Grammar classification:
- Regular expressions, regular sets:
- Automata
- The structure of DFA and NFA
-
-
- grammar
- Put semantic analysis into syntactic analysis
- Intermediate language representation and grammar-guided translation examples
- Symbol table
- optimization
- Object code generation
- Target code execution
mind Mapping
Introduction
What is a compiler?
A program that converts a certain high-level language program into another low-level language program equivalently
What are the stages of the compilation process?
Lexical analysis, syntax analysis, semantic analysis and intermediate code generation, intermediate code optimization, target code generation
Draw the structure diagram of the compiler.
Symbol table manager-lexical analyzer-error handling program
syntax analyzer
semantic analysis + intermediate code generator
optimizer
target code generator
What is the difference between a compiler and an interpreter?
The interpreter program either executes the source program directly while interpreting it, or executes it after translating the source program into some intermediate representation. The control is in the interpreter; the
compiler translates the source program into a target language program, and then runs the target program on the computer, and the control is in the target program.
Why use the concept of universal and phase?
Pass: Scan the source program or the intermediate representation from beginning to end once, and do the related processing, the new intermediate result or target program of the shape process.
There can be several stages in one pass, and one stage can be
divided into several passes in order to make the program structure better. Clarity makes the program more readable.
lexical analysis
Input and output of word classification and lexical analysis
Classification: basic words, identifiers, constants, operators, delimiters
Input: source program output: word symbols
Word formation rules
Representation: word type + word symbol attribute value.
Word type is commonly represented by integer encoding:
1. If a category has only one word symbol:
the category code represents the word symbol.
Generally speaking, the basic characters, operators and delimiters are all one symbol. ·If there are multiple word symbols in a category,
2. If there are multiple word symbols in one category:
for each word symbol, the category code and attribute information are given.
Identifiers are listed in a single type: attributes are pointers to store its related information.
Constants are classified by type: the value of a constant is expressed in standard binary form.
The concepts of grammar and language and their mutual derivation
Language: Any subset L of an alphabet is called a language of the alphabet, and a value in the language is called a sentence.
Grammar: The formal rules used to describe the grammatical structure of a language.
The language of the grammar: the set of all terminal strings produced by the grammar.
The leftmost and rightmost derivation, find the syntax tree
Leftmost derivation: Replace the leftmost nonterminal
every time. Right derivation: Replace the rightmost nonterminal every time.
Syntax tree: each node is Vt, Vn, the root node is S, and the leaf node is Vt
Ambiguity
Grammatical ambiguity: There are two grammatical trees for a sentence in a grammar.
Language ambiguity: There is a problem of grammatical ambiguity.
Grammar classification:
Such as mind map
type 0, 1, 2, 3
Regular expressions, regular sets:
For the alphabet, if a is a letter, a is a normal set
. The intersection, union, union, etc. of the normal set are all normal sets.
Automata
Introduction: A model that reads the input symbols one by one, and jumps to the state according to the input.
The structure of DFA and NFA
1. Formal form-DFA
2. NFA determinization
3. DFA minimization
grammar
input Output
Input: word symbol
Output: parse tree
Top-down problems and solutions
Problem: left recursion, backtracking, false matching, unsuccessful analysis, not knowing the location of the error, low efficiency, and high cost
Solution: change left recursion to right recursion, extract left factor to eliminate backtracking, FIRST+FOLLOW, LL(1), predictive analysis table
Bottom-up core issues
Identify reducible strings for reduction
Canonical reduction and rightmost derivation
Normative statute: the reverse process of
derivation on the right side . Sentence pattern for derivation of normative statute: canonical sentence
pattern. Essence of normative statute: When a handle appears, just statute
Phrases, direct phrases, sentence patterns, handles
Phrase: a subtree of the grammar tree
Direct phrase: a phrase at the same level of the grammar tree
Sentence pattern: a string containing Vt or Vn
Handle: the leftmost direct phrase
Prefix, live prefix, valid item
Prefix: string prefix
Live prefix: a prefix of the handle
Valid items: string B derived from the live prefix A, B is the valid
item set specification family of A: A bunch of items, the items pass -> form a DFA, DFA can Identify live prefixes
Put semantic analysis into syntactic analysis
Grammar-guided translation
Combine static semantic checking and intermediate code generation into syntactic analysis.
Grammar-guided definition
Based on the context-free grammar, Vt and Vn are equipped with a set of attributes, and Vn->Vt is equipped with a set of semantic rules.
Attribute grammar
The semantic rule function does not have a grammatically guided definition of side effects.
Non-side effects: only the attribute value is calculated
SL attribute grammar
Comprehensive attributes: The comprehensive attributes of the non-terminal symbol A on the analytic tree node N can only be defined by the child nodes of N or the attributes of N itself.
Inheritance attribute: The inheritance attribute of the non-terminal symbol A on the analytic tree node N can only be defined by the attributes of N's siblings, parent nodes or N itself.
S-attribute grammar: A grammar-guided definition uses only comprehensive attributes.
L attribute grammar: A grammar-guided definition uses only inherited attributes.
Translation mode
Associate attributes (values) with grammatical symbols (Vt, Vn), and use "{}" to deficient the semantic rules, and insert them into the right part of the production to describe the language structure.
Abstract syntax tree
The information unnecessary for translation is removed from the syntax tree, so as to obtain a more effective source program intermediate identification. This transformed syntax tree becomes an abstract syntax tree.
Intermediate language representation and grammar-guided translation examples
Why use intermediate language
Facilitate code optimization work that has nothing to do with the machine,
compile consistent, and
make the structure of the program logically simple and clear
Intermediate language representation
Inverse Polish expression, ternary, indirect ternary, quaternary
Examples of sentence translation
Explain the translation of sentences The translation of
assignment sentences The translation of
control sentences
Symbol table
Symbol table composition
Name column
Information column: record different attributes (type, species, size, storage pointer)
Symbol table function
1. Register identifier attribute information
2. Find symbol attributes, check symbol context semantic legitimacy
3. As a basis for address allocation for target code generation
optimization
purpose:
Produce more efficient code
in principle:
Equivalent, effective and economical
Commonly used techniques:
Mind map
Basic block concept
A sequence of statements executed sequentially in a program, in which there is only one entry and one exit. The entry is the first statement and the exit is the last statement
Object code generation
Representation
Machine language, assembly language, machine language module to be assembled
Characteristics of each form
Machine language: All addresses have been located.
Assembly language: need to be assembled by assembler and converted into machine language.
Machine language modules to be assembled: Link them with some running programs and convert them into machine semantics.
Target code execution
Activity record
A continuous storage area to store the dynamic information needed for one execution of the process
Storage allocation strategy
Static storage allocation
Dynamic storage allocation
Stacked storage allocation
1. Static chain and activity record
2. Nested level display table display
Heap storage allocation
Allow data objects to be allocated and released freely