Review of compilation principles (2023.4.25 exam version)

This review is based on this book, if there is any improper writing, welcome to criticize and correct!
insert image description here

Chapter One

image-20230412203438342

Chapter two

operations on symbol strings

  • Equal: two strings of symbols are exactly the same

  • Length: Just count how many he has

  • Connection: Just write directly after it

  • The inverse of the symbol string: write -1 on the upper right of the symbol to indicate the inverse of the symbol string.

image-20230412110829924

  • Prefixes, Suffixes and Substrings

The prefix is ​​to remove the tail, and the suffix is ​​to remove the head. Both prefixes and suffixes are substrings, but substrings are not necessarily prefixes.

For example: ab is the prefix and substring of abc, c is the suffix and substring of abc.

  • Power operation: n strings are continuously connected

​ ω to the 0th power = ξ

​ ω's nth power = n ωs are connected

symbol string set operation

  • exponentiation

image-20230412112240560

  • Closures and Positive Closures

image-20230412112533135

Grammar related concepts

1. The composition of grammar

  • The terminal symbol set V T is a letter that cannot be decomposed, and is generally expressed in lowercase letters
  • The non-terminal set V N is the letter that can continue to be deduced, and is generally expressed in uppercase letters
  • start symbol S
  • production rule P

2. Sentence, sentence pattern, handle, subtree, simple subtree, phrase, simple phrase

  • Sentence: Contains only terminators. (Essentially all lowercase letters)
  • Sentence pattern: All symbol strings appearing in the derivation process are called sentence patterns. Sentence forms that contain only terminal symbols are called sentences.
  • Subtree: A node of the syntax tree, together with the part that shoots down, constitutes a subtree of the syntax tree.
  • Simple subtree: A subtree of height 2 is called a simple subtree. The two subtrees drawn in red below are called simple subtrees.

image-20230412114138402

  • Simple phrase: the leaf node of a subtree of height 2 (that is, simple subtree 0). DE and F in the above tree are simple phrases.

  • Handle: A symbolic string composed of the terminal nodes of the leftmost simple subtree. Like the tree above, the handle is DE.

3. Canonical derivation and canonical reduction

  • Canonical derivation: rightmost derivation, replacing the rightmost every time.
  • Canonical reduction: leftmost reduction, every time the leftmost is reduced.

4. Ambiguity

  • Ambiguity: If there are two different syntax trees for the same sentence, the sentence is said to be ambiguous.

  • Ambiguity solution:

    1. Modify the compilation algorithm. Specifies the precedence between operators.
    2. Modify the grammar directly. Guaranteed ordering of reductions (symbols on the left appearing once on the right of the rule may also cause ambiguity)

Compress or simplify the grammar

  • A grammar is said to be compressed or simplified if it has no harmful rules and no redundant rules.

1. Harmful rules

image-20230412193656148

2. Redundant rules

image-20230412193804431

3. Grammar equivalent transformation

1. Make the start symbol not appear on the right side of the production

2. Make each non-terminal symbol of the grammar deduce a terminal symbol string.

3. Make each non-terminal symbol of the grammar appear in a certain sentence pattern

4. Except special rules A→B

5. Eliminate empty rule A→ε

6. Eliminate left recursion (extended BNF notation)

  • For the first point, make the start symbol not appear on the right side of the production

image-20230412194301929

  • For the second point, make the grammar deduce a string of terminals for every nonterminal.

It is to construct a Vn' first, first find out the terminal that can be directly derived, and then ensure that the non-terminal is derived from the sentence (using the idea of ​​​​reduction), which has not been used much, and I am not very clear about it.

  • For the third point, make every nonterminal of the grammar appear in a sentence pattern

Just adopting the idea of ​​derivation to ensure that non-terminal symbols appear in the sentence pattern, this feeling is not very useful.

  • For the fourth point, except for the special rule A→B

It is to remove the middleman and construct the corresponding equivalent grammar first. If there are non-terminal symbols that do not appear in any sentence pattern, delete the rules corresponding to those non-terminal symbols, which does not feel very useful.

  • For the fifth point, eliminate the null rule A→ε

Just remove A->ε, and find the non-terminals and productions that produce empty strings to deduce ε. I don’t feel like I’ve used it much.

  • For the sixth point, eliminate left recursion. This is very important! ! !

Let's look at the definition in the book first.

image-20230412195837362

In the process of doing the questions, just follow the formulas given in the book.

How to use it, please refer to the following example

image-20230412201741049

Classification and Automata of Grammars

  • Type 0 grammar, deduces any string from any string

  • Type 1 grammar, the length of the left side is generally shorter or equal to the length of the right side

  • Type 2 grammars have only one nonterminal on the left side of the production, as shown below

    image-20230412203022350

  • Type 3 grammar, left linear grammar: the deduced nonterminals are placed on the left

    ​ Right Linear Grammar: Deduced non-terminals are placed on the right

    image-20230412203101656

image-20230412202913181

third chapter

Normal Form to NFA Conversion

image-20230413110309314

The difference between NFA and DFA

  • NFA is an uncertain finite automaton, DFA is a deterministic finite automaton

  • NFA can have several initial states, while DFA has only one initial state

  • NFA has several successor states, while DFA has only one successor state

NFA to DFA conversion

image-20230413173902935

Then the state transition diagram is drawn.

So how to judge whether this is DFA or NFA? (The exam is generally NFA, because he still wants you to convert NFA to DFA later)

If it is done formally, it is to draw a state transition table. In fact, we can see it directly by looking at the state diagram. If the path is not unique, it can only be directly judged as NFA.

image-20230413175307459

The following is the highlight, start the conversion from NFA to DFA

image-20230413184103509

image-20230413184124306

image-20230413184139782

The above is the conversion from NFA to DFA

Chapter Four

no big questions

  • The function of lexical analysis: read in the source program string, scan it one by one from left to right, and identify a series of the smallest grammatical units with independent meaning - words.

  • The task of lexical analysis: scan the source program, identify words, convert and output attribute words.

  • Two processing structures for lexical analysis:

    1. Lexical analysis program as the main program

    image-20230415095149661

    2. Lexical analysis program as a subroutine

    image-20230415095224612

  • Common types of word symbols: reserved words, identifiers, unsigned numbers, delimiters

  • State transition diagram for lexical analysis:

chapter Five

Three sets (first set, follow set, select set):

  • First set (symbol string): it is the first terminal symbol that can be deduced from the required symbol string
  • Follow set (non-terminal): It is also called look-ahead set. According to the definition, the look-ahead set of U is a set composed of terminal symbols or # immediately following U in all sentence patterns containing U. (If the symbol after U is empty, then treat the symbol after U as a special symbol #)
  • Select set (production):image-20230415100334751

for example:

image-20230415112352717

image-20230415112402630

LL(1) grammar

  • LL(1) grammar has no left recursion (such as A->Ab is left recursive)

    ​ No backtracking (the first symbol on the left side of the two productions is the same, this style belongs to backtracking, such as

    ​ A->ay|ab, this is backtracking, the method is to mention the left common factor and change it to A->M, M->a|b)

  • for example

    image-20230415183550049

LL(1) Analysis table construction

  • The construction of LL(1) analysis table:

What you need to know is: C means continue to read the next symbol;

​ R means to reread the current symbol, that is, not to read the next symbol;

​ RE(β) means to replace the top symbol with the inverse string of β

There are also some rules to know:

  1. If the first production is a nonterminal, write the production in reverse order/R
  2. If the first production is a terminal, write the reverse order of the production after removing the first terminal /C
  3. If the production is empty, write ξ/R
  4. [#,#] = succ
  5. If some terminal does not appear at the head of the right part of any rule, then [V T ,V T ] = ξ/C
  6. In other cases, it is an error, which is an error, and it is represented by a blank in the analysis table

Before drawing the LL(1) analysis table, first write the corresponding select set, and then use it as a comparison

When drawing the LL(1) analysis table, write all symbols (except ξ) that appear on the vertical axis, and write all terminal symbols including # on the right

for example

image-20230415190510294

Chapter Six

simple first approach

  • simple precedence

    # As a statement delimiter, its priority is the lowest.

    image-20230415191118809

  • Determine whether it is a simple-first grammar:

    A grammar is a simple-precedence grammar if it satisfies the following two conditions:

    1. In the grammatical symbol set V, there must be a precedence relationship between any two symbols.
    2. In a grammar, no two productions can have the same right-hand side.

To judge whether it is a simple-priority grammar, it is first necessary to analyze and determine the priority relationship between symbols in the grammar.

To judge the priority relationship, we need to know the following rules:

  1. "<" relationship: if the right part of the rule is a combination of terminal + non-terminal, then the first symbol derived from the terminal < non-terminal
  2. ">" relationship: if the right part of the rule is a combination of non-terminal + terminal, then the last symbol derived from this non-terminal > terminal
  3. "=" relationship: p—>AB, then A = B

for example

image-20230415202534222

The grammar satisfies the conditions of a simple-precedence grammar. So it is a simple-first grammar.

  • The parsing process of simple-first grammars is specification specification.

operator precedence method

  • Operator precedence parsing applies to the parsing of expressions.

  • What is an operator grammar: If any production does not contain two adjacent non-terminal symbols, then the grammar is an operator grammar

  • The construction method of the operator priority relation matrix:

    1. Construct A for each non-terminal and construct two sets FIRSTVT (A) and LASTVT (A)

​ FIRSTVT (A) = a, ais A=>a…or Ba…

​ LASTVT (A) = a, ais A=>…a or …aB

2. Determine priority relationship

​ (1) "=" relationship: if it is A->…ab… or A->…aBb…, then a=b

​ (2) "<" relationship: If the right side of the rule is a terminal + non-terminal (this is represented by A), then the non-terminal <FIRSTVT (A)

​ (3) ">" relationship: If the right side of the rule is a non-terminal (this is represented by A) + terminator, then LASTVT (A) > terminator

for example

image-20230415215011707

LR analysis table

The LR analysis table consists of two parts: analysis action table (ACTION) and state transition table (GOTO)

In the action table, there are four possible actions:

  • Reduction r, such as r3 is to use the third rule for reduction
  • Move into s, continue to scan, and become the current input symbol from the next input symbol
  • Accept acc, when the input string is only #, the analysis is complete
  • erroe

LR(0) analysis table

The exam generally examines the structure of the LR(0) table

Here we use after-school question 6.4 for reference, as shown in the figure below

image-20230419101100997image-20230419101111507

The emergence of S7 can be understood by referring to the following:

  1. In the state set of the same project, if the successor symbols of different projects are the same, the successor states are the same
  2. In different item state sets, if the corresponding same item appears, the subsequent state is also the same

image-20230419101118180

The SLR(1) method is a simple LR(0) analysis method. The SLR(1) analysis table does not contain conflicting actions. If it is found that the LR(0) analysis table contains conflicting actions (Table 6.19 on page 113 of the book) contains conflict actions), then change it to SLR (1) and remove the conflict.

image-20230419103533067

SLR(1) analysis table

The construction methods of SLR(1) and LR(0) analysis tables are similar. The main difference is that after constructing the item set specification family, when constructing the SLR(0) table, when the subsequent symbol is a terminal symbol, only this terminal symbol belongs to Only write r when Follow (on the left side of the rule), and do not write if it does not belong.

Chapter VII

Semantic Analysis

  • Basic tasks of semantic analysis:

    1. Determination of type: type of data object
    2. Type checking: type checking of operations and operands
    3. Confirm Meaning: Confirm the meaning of the control structure
    4. Other semantic checks: don't allow loops in vitro to in vivo etc.
  • A method for implementing syntax-directed translation

    1. incremental grammar
    2. attribute grammar

intermediate code

  • abstract syntax tree
  • Reverse Polish
  • Quaternary
  • Ternary

chapter eight

Code optimization classification:

image-20230419113040234

Code optimization techniques:

  • merge constant operations
  • Remove useless assignment
  • Reduce computational intensity
  • Remove redundant operations
  • external invariant expression

Guess you like

Origin blog.csdn.net/weixin_53270267/article/details/130272771