This review is based on this book, if there is any improper writing, welcome to criticize and correct!
Chapter One
Chapter two
operations on symbol strings
-
Equal: two strings of symbols are exactly the same
-
Length: Just count how many he has
-
Connection: Just write directly after it
-
The inverse of the symbol string: write -1 on the upper right of the symbol to indicate the inverse of the symbol string.
- Prefixes, Suffixes and Substrings
The prefix is to remove the tail, and the suffix is to remove the head. Both prefixes and suffixes are substrings, but substrings are not necessarily prefixes.
For example: ab is the prefix and substring of abc, c is the suffix and substring of abc.
- Power operation: n strings are continuously connected
ω to the 0th power = ξ
ω's nth power = n ωs are connected
symbol string set operation
- exponentiation
- Closures and Positive Closures
Grammar related concepts
1. The composition of grammar
- The terminal symbol set V T is a letter that cannot be decomposed, and is generally expressed in lowercase letters
- The non-terminal set V N is the letter that can continue to be deduced, and is generally expressed in uppercase letters
- start symbol S
- production rule P
2. Sentence, sentence pattern, handle, subtree, simple subtree, phrase, simple phrase
- Sentence: Contains only terminators. (Essentially all lowercase letters)
- Sentence pattern: All symbol strings appearing in the derivation process are called sentence patterns. Sentence forms that contain only terminal symbols are called sentences.
- Subtree: A node of the syntax tree, together with the part that shoots down, constitutes a subtree of the syntax tree.
- Simple subtree: A subtree of height 2 is called a simple subtree. The two subtrees drawn in red below are called simple subtrees.
-
Simple phrase: the leaf node of a subtree of height 2 (that is, simple subtree 0). DE and F in the above tree are simple phrases.
-
Handle: A symbolic string composed of the terminal nodes of the leftmost simple subtree. Like the tree above, the handle is DE.
3. Canonical derivation and canonical reduction
- Canonical derivation: rightmost derivation, replacing the rightmost every time.
- Canonical reduction: leftmost reduction, every time the leftmost is reduced.
4. Ambiguity
-
Ambiguity: If there are two different syntax trees for the same sentence, the sentence is said to be ambiguous.
-
Ambiguity solution:
- Modify the compilation algorithm. Specifies the precedence between operators.
- Modify the grammar directly. Guaranteed ordering of reductions (symbols on the left appearing once on the right of the rule may also cause ambiguity)
Compress or simplify the grammar
- A grammar is said to be compressed or simplified if it has no harmful rules and no redundant rules.
1. Harmful rules
2. Redundant rules
3. Grammar equivalent transformation
1. Make the start symbol not appear on the right side of the production
2. Make each non-terminal symbol of the grammar deduce a terminal symbol string.
3. Make each non-terminal symbol of the grammar appear in a certain sentence pattern
4. Except special rules A→B
5. Eliminate empty rule A→ε
6. Eliminate left recursion (extended BNF notation)
- For the first point, make the start symbol not appear on the right side of the production
- For the second point, make the grammar deduce a string of terminals for every nonterminal.
It is to construct a Vn' first, first find out the terminal that can be directly derived, and then ensure that the non-terminal is derived from the sentence (using the idea of reduction), which has not been used much, and I am not very clear about it.
- For the third point, make every nonterminal of the grammar appear in a sentence pattern
Just adopting the idea of derivation to ensure that non-terminal symbols appear in the sentence pattern, this feeling is not very useful.
- For the fourth point, except for the special rule A→B
It is to remove the middleman and construct the corresponding equivalent grammar first. If there are non-terminal symbols that do not appear in any sentence pattern, delete the rules corresponding to those non-terminal symbols, which does not feel very useful.
- For the fifth point, eliminate the null rule A→ε
Just remove A->ε, and find the non-terminals and productions that produce empty strings to deduce ε. I don’t feel like I’ve used it much.
- For the sixth point, eliminate left recursion. This is very important! ! !
Let's look at the definition in the book first.
In the process of doing the questions, just follow the formulas given in the book.
How to use it, please refer to the following example
Classification and Automata of Grammars
-
Type 0 grammar, deduces any string from any string
-
Type 1 grammar, the length of the left side is generally shorter or equal to the length of the right side
-
Type 2 grammars have only one nonterminal on the left side of the production, as shown below
-
Type 3 grammar, left linear grammar: the deduced nonterminals are placed on the left
Right Linear Grammar: Deduced non-terminals are placed on the right
third chapter
Normal Form to NFA Conversion
The difference between NFA and DFA
-
NFA is an uncertain finite automaton, DFA is a deterministic finite automaton
-
NFA can have several initial states, while DFA has only one initial state
-
NFA has several successor states, while DFA has only one successor state
NFA to DFA conversion
Then the state transition diagram is drawn.
So how to judge whether this is DFA or NFA? (The exam is generally NFA, because he still wants you to convert NFA to DFA later)
If it is done formally, it is to draw a state transition table. In fact, we can see it directly by looking at the state diagram. If the path is not unique, it can only be directly judged as NFA.
The following is the highlight, start the conversion from NFA to DFA
The above is the conversion from NFA to DFA
Chapter Four
no big questions
-
The function of lexical analysis: read in the source program string, scan it one by one from left to right, and identify a series of the smallest grammatical units with independent meaning - words.
-
The task of lexical analysis: scan the source program, identify words, convert and output attribute words.
-
Two processing structures for lexical analysis:
1. Lexical analysis program as the main program
2. Lexical analysis program as a subroutine
-
Common types of word symbols: reserved words, identifiers, unsigned numbers, delimiters
-
State transition diagram for lexical analysis:
chapter Five
Three sets (first set, follow set, select set):
- First set (symbol string): it is the first terminal symbol that can be deduced from the required symbol string
- Follow set (non-terminal): It is also called look-ahead set. According to the definition, the look-ahead set of U is a set composed of terminal symbols or # immediately following U in all sentence patterns containing U. (If the symbol after U is empty, then treat the symbol after U as a special symbol #)
- Select set (production):
for example:
LL(1) grammar
-
LL(1) grammar has no left recursion (such as A->Ab is left recursive)
No backtracking (the first symbol on the left side of the two productions is the same, this style belongs to backtracking, such as
A->ay|ab, this is backtracking, the method is to mention the left common factor and change it to A->M, M->a|b)
-
for example
LL(1) Analysis table construction
- The construction of LL(1) analysis table:
What you need to know is: C means continue to read the next symbol;
R means to reread the current symbol, that is, not to read the next symbol;
RE(β) means to replace the top symbol with the inverse string of β
There are also some rules to know:
- If the first production is a nonterminal, write the production in reverse order/R
- If the first production is a terminal, write the reverse order of the production after removing the first terminal /C
- If the production is empty, write ξ/R
- [#,#] = succ
- If some terminal does not appear at the head of the right part of any rule, then [V T ,V T ] = ξ/C
- In other cases, it is an error, which is an error, and it is represented by a blank in the analysis table
Before drawing the LL(1) analysis table, first write the corresponding select set, and then use it as a comparison
When drawing the LL(1) analysis table, write all symbols (except ξ) that appear on the vertical axis, and write all terminal symbols including # on the right
for example
Chapter Six
simple first approach
-
simple precedence
# As a statement delimiter, its priority is the lowest.
-
Determine whether it is a simple-first grammar:
A grammar is a simple-precedence grammar if it satisfies the following two conditions:
- In the grammatical symbol set V, there must be a precedence relationship between any two symbols.
- In a grammar, no two productions can have the same right-hand side.
To judge whether it is a simple-priority grammar, it is first necessary to analyze and determine the priority relationship between symbols in the grammar.
To judge the priority relationship, we need to know the following rules:
- "<" relationship: if the right part of the rule is a combination of terminal + non-terminal, then the first symbol derived from the terminal < non-terminal
- ">" relationship: if the right part of the rule is a combination of non-terminal + terminal, then the last symbol derived from this non-terminal > terminal
- "=" relationship: p—>AB, then A = B
for example
The grammar satisfies the conditions of a simple-precedence grammar. So it is a simple-first grammar.
- The parsing process of simple-first grammars is specification specification.
operator precedence method
-
Operator precedence parsing applies to the parsing of expressions.
-
What is an operator grammar: If any production does not contain two adjacent non-terminal symbols, then the grammar is an operator grammar
-
The construction method of the operator priority relation matrix:
1. Construct A for each non-terminal and construct two sets FIRSTVT (A) and LASTVT (A)
FIRSTVT (A) = a, ais A=>a…or Ba…
LASTVT (A) = a, ais A=>…a or …aB
2. Determine priority relationship
(1) "=" relationship: if it is A->…ab… or A->…aBb…, then a=b
(2) "<" relationship: If the right side of the rule is a terminal + non-terminal (this is represented by A), then the non-terminal <FIRSTVT (A)
(3) ">" relationship: If the right side of the rule is a non-terminal (this is represented by A) + terminator, then LASTVT (A) > terminator
for example
LR analysis table
The LR analysis table consists of two parts: analysis action table (ACTION) and state transition table (GOTO)
In the action table, there are four possible actions:
- Reduction r, such as r3 is to use the third rule for reduction
- Move into s, continue to scan, and become the current input symbol from the next input symbol
- Accept acc, when the input string is only #, the analysis is complete
- erroe
LR(0) analysis table
The exam generally examines the structure of the LR(0) table
Here we use after-school question 6.4 for reference, as shown in the figure below
The emergence of S7 can be understood by referring to the following:
- In the state set of the same project, if the successor symbols of different projects are the same, the successor states are the same
- In different item state sets, if the corresponding same item appears, the subsequent state is also the same
The SLR(1) method is a simple LR(0) analysis method. The SLR(1) analysis table does not contain conflicting actions. If it is found that the LR(0) analysis table contains conflicting actions (Table 6.19 on page 113 of the book) contains conflict actions), then change it to SLR (1) and remove the conflict.
SLR(1) analysis table
The construction methods of SLR(1) and LR(0) analysis tables are similar. The main difference is that after constructing the item set specification family, when constructing the SLR(0) table, when the subsequent symbol is a terminal symbol, only this terminal symbol belongs to Only write r when Follow (on the left side of the rule), and do not write if it does not belong.
Chapter VII
Semantic Analysis
-
Basic tasks of semantic analysis:
- Determination of type: type of data object
- Type checking: type checking of operations and operands
- Confirm Meaning: Confirm the meaning of the control structure
- Other semantic checks: don't allow loops in vitro to in vivo etc.
-
A method for implementing syntax-directed translation
- incremental grammar
- attribute grammar
intermediate code
- abstract syntax tree
- Reverse Polish
- Quaternary
- Ternary
chapter eight
Code optimization classification:
Code optimization techniques:
- merge constant operations
- Remove useless assignment
- Reduce computational intensity
- Remove redundant operations
- external invariant expression