"Compiler Principles" Review Chapter 1~Chapter 5

foreword

The exercises are from the chapter tests and homework of Chaoxing's "Jilin University Compilation Principles" course. The full text is for your own learning and use. If there are mistakes, please correct me.


Class time arrangement (course focus)

2.4-2.7: Regular expressions describe words and deterministic finite automata (DFA) Part
2.8-2.10: NFA, the determinization of NFA, the simplification of DFA Section
2.11 and all of Chapter 3
4.1: Grammar definition part, this part of knowledge points It is the theoretical basis of grammatical analysis
4.2~4.4: grammatical equivalent transformation, grammatical analysis function and the method of three sets. Among them, the grammatical equivalence transformation only requires mastering the elimination of common prefixes and direct left recursion, and understanding of other grammatical equivalence transformation methods is sufficient. The finding of the three sets is the key point and the difficulty point, and it is a knowledge point that must be mastered.
4.5: The recursive descent method part, which is the focus of this chapter. In the recursive descent method, it is necessary to understand the idea of ​​recursive descent parsing and master the structure of recursive descent parsing program.
Section 4.6~4.7: LL(1) syntax analysis method is the key content of this chapter. It is necessary to understand the idea of ​​LL(1) syntax analysis method, learn to construct LL(1) analysis table, and master the structure of LL(1) syntax analysis program , for a grammar, the analysis process can be given for a given symbol string.
Chapter 5: Not within the scope of the final exam. (5.1 The basic idea of ​​bottom-up syntax analysis should be carefully read, and other sections can be simply read.)
6.1-6.3: Semantic analysis content
6.4-6.11: Involving the attribute representation of function identifiers and domain name identifiers, various The internal representation of a type as well as the internal representation of a value.
6.12-6.17: The knowledge points involved include the scope of identifiers, the localization of symbol tables and the global symbol table.
7.1~7.5: Among them, 7.5 LR(1) grammar-guided method can only be understood briefly, quaternary formula must be mastered, and it is required to understand the basic idea of ​​grammar-guided translation and LL(1) grammar-guided method.
7.6-7.9: subscript variable
7.10-7.15
8.1-8. 4
8.5-8.8
9.1-9.5
10.1-10.7

Chapter 1: Introduction to Compilation

1. Summary (mind map)Please add a picture description

So the content of this chapter is:

  • 1.1 Programming language development and its high-level language implementation
  • 1.2 Composition of Compiler

Knowledge points: compiler, overview of compilation process, structure of compiler, generation of compiler, learning to construct compiler.
Key points: the basic process of compiling the program and the basic tasks of each stage, compiling the program framework.
Difficulty: The structure of the compiled program.


2. Practice questions

1. Which of the following statements about high-level language implementation is wrong (D).
A. The fundamental difference between the compiling method and the interpreting method is whether to generate object code;
B. The compiling method deals with the source program after translation;
C. The interpreting method is to analyze and explain sentence by sentence according to the dynamic order of the statements in the source program , and execute immediately;
D. The interpretation method does not translate the source program;

Answer analysis: The interpretation method also needs to translate the source program, but it is executed while translating, and the target program is not generated.


2. "The source program written in a high-level language must be compiled to generate the object code before it can be put into operation.", is this statement
correct?

Incorrect

There are ①compilation, ②interpretation, and ③transformation to realize the high-level language, so the source program written in the high-level language can also be implemented by interpreting or converting.


3. The set of symbol strings A= { a,b,c,d,…,z,A,B,C,D,…,Z }, the following statement about the set A n is correct: (B ) A.
The The set is a set of English letters
B. The lengths of the symbol strings in this set are all equal to n
C. The set is a set of English strings
D. The lengths of the symbol strings in this set are all less than or equal to n

A n is the product of the English letter set A self n times, which can represent an English string set with a length of n.


4. r=(a|b|c)(x|y|z), then the number of elements in L® is: 9 Each character
string in the regular set represented by r contains two parts, the first part can only be a, One of the three types of b or c, the latter part can only be one of the three types of x, y or z, and there are 3*3 or 9 changes in total.


5. The regular expression (a*|b)*(c|d)equivalent to regular expression is:

 (a|b)*c| (a|b)*d  

6. Let the regular expression r = (a|b)(x|y)*, then the following wrong regular set elements are: (A)
A. abx
B. bxxx
C. a
D. bxyyxxy

Answer analysis: The normal set elements represented by r all contain two parts. The first part can only be a or b , and the latter part can be connected by any power of x or y. Because it is a star closure, x and y may not be included.


7. Arithmetic expression 123+45.6, after lexical analysis, the following legal words are: (ACD)
A. Decimal number 123
B. Symbol string 123
C. Operator +
D. Decimal number 45.6

Answer analysis: The arithmetic expression can be analyzed into 3 words through lexical analysis - ① constant decimal integer 123, ② operator + and ③ real number 45.6.


8. Alphabet Σ={0,1}, the following symbol strings belonging to Σ are (ABC)
A. 1
B. 000
C. ε
D. Empty set

basic concept. A string of symbols is any finite sequence of symbols from the alphabet. A string of symbols may not contain any symbols, such a string is called the empty string ε. And the empty set obviously does not meet the definition.


9. Let the alphabet S={0,1}, and write a regular expression to express:
1). All strings defined on S;
(0|1)*

2). Binary number without leading 0;
0|(1(0|1)*)

3). A binary number that is divisible by 2 with a non-zero leading edge.
0| (1(0|1)*0)

Answer analysis:
The priority of multiplication is higher than that of OR operation
*, which means multiple bits are repeated, each bit is 0 or 1.
The first bit of a non-zero number without leading 0 is 1, and the following * operation repeats multiple bits, each bit is 0 or 1 .On
the basis of 2 questions, binary numbers ending with 0 can be divisible by 2.


10. [Single-choice question] The tasks that must be completed by the compiler are (①②③⑥).
① Lexical analysis
② Syntax analysis
③ Semantic analysis
④ Intermediate code generation
⑤ Intermediate code optimization
⑥ Object code generation
A, ①②③④⑤⑥
B, ①②③④
C, ①②③⑥
D, ①②③④⑤⑤

The source program can be directly converted into the target program after lexical analysis, syntax analysis and semantic analysis. The purpose of intermediate code generation is to facilitate optimization and transplantation, so intermediate code generation and intermediate code optimization are not necessary tasks in the compilation process.


11. The work of each stage of the compiler involves (table management) and (error handling).



Chapter 2: Formal Languages ​​and Finite Automata

1. Summary (mind map)

Please add a picture description

Teaching content:
2.1 Overview of lexical analysis
2.2 Basic concepts of characters and strings
2.3 Definition of regular expressions
2.4 Regular expressions describe words 2.5 Deterministic finite automata DFA
. Machine NFA 2.9 Determinization of NFA 2.10 Simplification of DFA 2.11 Mutual conversion between regular expressions and finite automata





Knowledge points: regular expressions and finite automata.
Focus: regular expressions and finite automata.
Difficulty: the relationship between regular expressions and finite automata.


2. Practice questions

  1. If the collection represented by a regular expression is infinite, the operations that the regular expression must contain are:*

2. What is not a component of DFA is (B):
A, finite alphabet
B, initial state set
C, terminal state set
D, finite state set

According to the definition of DFA, DFA can only have a unique initial state.


3. Finite automata M and N are equivalent (D):
A, M and N have the same alphabet
B, M and N have the same number of states
C, M and N have the same number of states or directed edges
D, M and The set of strings recognized by N is the same


4. Compared with DFA, the non-determinism of NFA is reflected in: (AC)
A, allowing multiple start states
B, allowing multiple end states
C, allowing state transitions without any input
D, a A state can have several different successor states


5. The following statement about DFA is correct (C):
A. A DFA can identify a symbol string through multiple paths
B. The language recognized by a DFA is an infinite set, so the number of states of the DFA must also be infinite
C. The language recognized by a DFA is an infinite set, then the state diagram of the DFA must contain a loop
D. A DFA cannot accept the empty string ε

A: The subsequent state of each state of DFA is definite, and its recognition path for each symbol string is also definite, and there is only one recognition path; B: According to the
concept, the number of states of DFA is limited;
D: As long as the DFA start state is also the end state, the DFA can recognize the empty string.


6. The finite automata obtained by merging the finite automata that recognize various words: (A)
A. It may be NFA, or it may be DFA
B. It must be DFA
C. It must be NFA
D. It must be the smallest DFA


7. The correct statement about the determinization of NFA is: (ABD)
A. The determinization algorithm of NFA has the function of eliminating ε empty edges
B. Given an NFA, there must be a DFA that makes the two equivalent
C. An NFA There can only be one DFA and its equivalent
D. After the NFA is determined, the state transition function will become a single-valued function

B: The judgment is correct according to the equivalence theorem;
A, D: NFA is transformed into DFA after determinization, and DFA has no empty edge, and the state transition function is a single-valued function, so it is correct; C: The error is because the
NFA equivalent DFA can There are many, but the simplest DFA is just one.


8. The correct statement of the direct steering method realized by the automaton is: (AD)
A. The direct steering method is a method based on the state transition diagram
B. The direct steering method is simple in program design, but takes up a large storage space
C. The direct steering method The method is a method based on the state transition matrix.
D. The direct steering method takes up less storage space, but the corresponding program is longer

There are usually two methods for the realization of automata: the state transition matrix method and the direct steering method (that is, the state transition diagram).
The advantage of the state transition matrix method is that the program is short, but it takes up more storage space; the
direct steering method is based on the state transition diagram, and the advantage is that it takes up less space, but the program is longer.


9. Design a definite finite state automaton that recognizes binary positive integers divisible by 5 (excluding numbers with leading zeros).
Reference answer:
insert image description here

The answer I made:
insert image description here

insert image description here



Chapter 3: Implementation of Lexical Analyzer

1. Summary (mind map)

Please add a picture description
Teaching content:
3.1 Preparation before lexical analyzer implementation
3.2 Specific implementation of lexical analyzer
3.3 Notes for implementing lexical analyzer
Knowledge points: lexical analyzer task, lexical analyzer design.
Focus: The task and design of the lexical analyzer, state transition diagram.

Tokenization, also called word segmentation, is an operation that divides text into a string sequence (its elements are generally called tokens, or words) according to specific requirements.

Generally speaking, we require the elements of the sequence to have certain meanings. For example, "text mining is time-consuming" needs to be processed into "text mining/ is/ time-consuming", where "text mining" means "text mining". If we deduplicate all the tokens in the corpus, we get a vocabulary, each of which is called a type. In English information processing, tokenization needs to convert sentences like "I'm Li" into "I am Li", that is, to standardize the writing of some words and phrases.


2. Practice questions

1. The input of the lexical analyzer is (B)
A, word symbol string
B, source program
C, grammatical unit
D, object program

The function of the lexical analyzer is to convert the source program from a character sequence to an equivalent token sequence, and to check the lexical errors in the source program, so the input of the lexical analyzer is the source program.


2. The lexical analysis program as a subroutine of the syntax analysis program, its return result is (C)
A, the value of the word attribute
B, the position representation of the word in the symbol table
C, the type code and semantic value of the word
D, the word

When the lexical analysis program exists as a subroutine of the grammatical analyzer, each time the lexical analyzer is called, the lexical analyzer will return a token. The token is the internal representation of the word, including the type information and semantic value of the word . .


3. The purpose of dividing the compiled program into several "passes" is (B)
A. Utilize limited machine memory and improve machine execution efficiency
B. Make the structure of the program clearer
C. Improve program execution efficiency
D. Utilize limited machine memory but reduces the execution efficiency of the machine

The so-called "pass" is to scan the source program or the intermediate representation form of the source program from the beginning to the end, and process it to generate a new intermediate result or target program.
The compiled program is divided into several "passes". In fact, the compiled program scans the source program or its equivalent intermediate code several times, and performs different functions each time. The purpose of this is to make the structure of the compiled program clearer.


4. The following (C) errors can be found in the lexical analysis.
A. The type of the operand does not match
B. The identifier is repeatedly declared


C. Illegal symbols appear in the program D. Division overflow Error, among the above errors, only illegal symbols are lexical errors.


5. The syntax analyzer in the compiler accepts input in units of (C) and generates relevant information for use in subsequent stages.
A, expression
B, production
C, word
D, statement

The function of the lexical analyzer is to convert the source program from a character sequence to an equivalent token sequence, and check the lexical errors in the source program, so the grammatical analyzer accepts words ( Consists of characters) as the unit of input.


6. It is (C) that cannot be recognized in the lexical analysis stage.
A, identifier
B, operator
C, quaternion
D, constant

The function of the lexical analyzer is to convert the source program from a character sequence to an equivalent token sequence, and check the lexical errors in the source program. The token contains words Type information (reserved words, identifiers, constants, special symbols) and semantic values, quaternions do not belong to words, and are often the product of the optimization stage, so they cannot be recognized in the lexical analysis stage.


7. Are there any languages ​​that can be recognized by deterministic finite automata but cannot be represented by regular expressions (B).
A. Exists
B. Does not exist
C. Unable to determine whether it exists

Regular expressions and automata are equivalent in their ability to accept languages.


8. The language described by grammar G is the set of (D).
A. A string of all symbols in alphabet V of grammar G.
B. All strings of symbols in the closure V* of the alphabet of the grammar G.
C. All symbol strings deduced from the beginning character of the grammar.
D. All strings of terminal symbols deduced from the beginning symbol of the grammar.

It can be seen from the definition of grammar that the language defined by the grammar is the set of all terminal strings deduced from the beginning symbol of the grammar.


9. Which of the following grammars can produce the language shown in the figure is (D).

insert image description here
A.
Z→aZb | aAb | b
A→aAb | b
B.
A→aAb
A→b
C.
Z→AbB
A→aA | a




B →bB | b D. Z→aAb A→aAb | , the numbers of a and b on both sides are symmetrical, and there is at least one.


10. The relationship between regular expressions and finite automata The following statements are correct: (BC)
A. A regular expression can only be equivalent to a definite finite automaton
B. Regular expressions, NFA and DFA are accepted in the language are mutually equivalent in ability
C. For any form of regular expression r, there is an NFA M that satisfies L(M)=L®
D. A regular expression can be transformed into an equivalent automaton, but the automaton Not necessarily all can be represented as equivalent regular expressions

Regular expressions and automata are equivalent in the ability to accept language, and a regular expression may be equivalent to multiple automata.


11. Please explain the relationship between the production restrictions and language description ability of the four types of grammars (AD)
A. From Type 0 grammars to Type 3 grammars, the restrictions on productions are gradually strengthened
B. From Type 0 grammars to Type 3 grammars , the restriction on the production is gradually weakened
C. From type 0 grammar to type 3 grammar, the language description ability is gradually enhanced
D. From type 0 grammar to type 3 grammar, the language description ability is gradually weakened

four types of grammar



Chapter 4: Top-Down Parsing Approaches

1. Summary (mind map)

insert image description here

This blog is well written about seeking the First set, Follow set, and LL(1) table

Teaching Content:
4.1 Grammar Definition
4.2 Grammar Equivalent Transformation
4.3 Syntax Analysis Function
4.4 Finding Method of Three Sets
4.5 Recursive Descent Analysis Method
4.6 LL(1) Analysis Method

4.1: Grammar definition part, this part of knowledge points is the theoretical basis of grammatical analysis
4.2~4.4: Grammatical equivalent transformation, grammatical analysis function and the method of three sets. Among them, the grammatical equivalence transformation only requires mastering the elimination of common prefixes and direct left recursion, and understanding of other grammatical equivalence transformation methods is sufficient. The finding of the three sets is the key point and the difficulty point, and it is a knowledge point that must be mastered.
4.5: The recursive descent method part, which is the focus of this chapter. In the recursive descent method, it is necessary to understand the idea of ​​recursive descent parsing and master the structure of recursive descent parsing program.
4.6: LL(1) grammatical analysis method is the key content of this chapter. It is necessary to understand the idea of ​​LL(1) grammatical analysis method, learn to construct LL(1) analysis table, and master the structure of LL(1) grammatical analysis program. Grammar, for a given symbol string, can give the analysis process.

Nice blog about "phrases, direct phrases, handles"
insert image description here


2. Exercises

1. If the grammar G is unambiguous, then any sentence α of it: (A)
A. The syntax trees corresponding to the leftmost derivation and rightmost derivation must be the same
B. The syntax trees corresponding to the leftmost derivation and rightmost derivation May be different
C. Leftmost and rightmost derivations must be the same
D. There may be two different leftmost derivations

Answer analysis: If the grammar G is unambiguous, for any sentence α, the syntax trees corresponding to the leftmost derivation and the rightmost derivation must be the same, but the leftmost derivation is the branch on the left, and the last The right derivation grows the right branch; for D, if there are two different leftmost derivations, there must be ambiguity. So choose A.

2. The following grammar is known:
E→TE' | ε
E'→+TE' | ε
T→FT'
T'→*FT' | ε
F→(E) | id
then Follow(F)=(D ).

A. {
    
    *,+}
B. {
    
    *,ε}
C. {
    
    +,#, )}
D. {
    
    *,+,#,)}
E. {
    
    #,)}
F. {
    
    *,+,#, ),id}

insert image description here

3. During the compilation process, the function of syntax analysis is (C).
①Analyze how words are formed; ②Analyze how word strings form sentences; ③Analyze how sentences form programs; ④Analyze the structure of programs.

A. ②③
B. ④
C. ②③④
D. ①②③④

4. Suppose the grammar G[S] is:
S->AB|bC
A-> ε|b
B-> ε|aD
C->AD|b
D->aS|c
, which of the following characters are contained in First(S) : (ABD)
A. a
B. b
C. c
D. ε
E. #

5. Suppose the grammar G[S] is:
S->AB|bC
A-> ε|b
B-> ε|aD
C->AD|b
D->aS|c
, then Follow(A) contains which of the following characters : (ACE)
A. a
B. b
C. c
D. ε
E. #
insert image description here
6. Suppose the grammar G[S] is:
S->eT | RT
T->DR | ε
R->dR | ε
D- >a | bd
which of the following characters are contained in the First(S) set? (ABCDE)
A. a
B. b
C. d
D. e
E. ε
F. #
7. The following grammar is known:
S→eT | RT
T→DR | ε
R→dR | ε
D→a | bd
Then First (S)=( ① ), First(T)=( ② ), First®=( ③ ), First(D)=( ④ ).
Answer:
① {a,b,d,e,ε}, ②{a,b,ε}, ③{d,ε}, ④{a,b}

  1. Eliminate the left recursion of the following grammar G[E] through grammar equivalent transformation, and use the transformed grammar to draw the syntax i*i+i*(i+i)+itree of the expression.
    E -> T | E + T T
    -> F | T * F
    F -> i | ( E )
    insert image description here
    insert image description here
    9. The following grammar is known:
    S→eT | RT
    T→DR | ε
    R→dR | ε
    D→a | bd
    seeks First(S), First(T), First(R), First(D).
    Correct answer:
    First(S)={ a, b, d, e, ε }
    First(T)={ a, b, ε }
    First(R)={ d, ε}
    First(D)={ a, b }
    Answer analysis: According to the definition of the First set:
    First(S) = First(eT) ∪ First(RT) = { e } ∪ { d, a, b, ε} = { a, b, d, e, ε }
    First(T) = First(DR) ∪ {ε} = {a,b} ∪ {ε} = { a,b,ε} First® = First(dR) ∪ {ε} = {d} ∪ {ε
    } ={ d, ε}
    First(D) = First(a) ∪ First(bd) = { a, b}

10. Suppose the grammar G[S] is as follows:
S→a | (T)
T→T,S | S
Please give all phrases, simple phrases and handles of the sentence (a,(a,a)).

Correct answer:
Phrase: (a,(a,a) ), a,(a,a), the first two three a, (a,a), a,a simple phrase: the first two three a
handle
: Analysis of the answer to the first a
: According to the definition of phrases, simple phrases and handles, it can be obtained.
You can also draw the syntax tree of the sentence first, and then obtain the phrase according to the leaf nodes of each subtree , obtain the simple phrase from the leaf nodes of each simple subtree , and obtain the handle from the leaf nodes of the leftmost simple subtree .

11.insert image description here
Answer:
insert image description here



Chapter 5: Bottom-Up Parsing Approaches

1. Summary (mind map)

insert image description here

2. Practice questions

1. Among the grammatical analysis methods of high-level language compilers, the recursive descent method belongs to (B) analysis method.
A. From left to right
B. From top to bottom
C. From bottom to top
D. From right to left

2. [True or False Questions] To use the top-down analysis method, the left recursion of the grammar must be eliminated first. (√)

3. [Judgment question] For any grammar, it can be rewritten into LL(1) grammar. (×)
There are 4 conditions to write LL(1) grammar:
insert image description here
add a condition: and there is no common prefix in the grammar! ! ! !
Otherwise, during the analysis process, for a certain input character, if the branch matches more than two branches, it is definitely not known how to jump.

4. Among the following grammars, ____ C_____ is an LL(1) grammar.
A. S->aSb | ab
B. S->ab | Sab
C. S->aSb | b
D. S->aS |

Analysis:
A: There is a common prefix
B: There is a direct left recursion
D: There is a common prefix

5. The following grammar is known:
S->eT | R T
T->DR | ε
R->dR | ε
D->a | bd
Then the LL(1) analysis table of this grammar is ( D ).
insert image description here

Analysis:
insert image description here
6. Which of the following grammars belongs to LL(1) grammar: (AD)
A.
G[S]:
S → ABc
A → a | ε
B → b | ε

B.
G[S]:
S → Ab
A → a | B| ε
B → b | ε

C.
G[S]:
S → ABBA
A → a | ε
B → b | e

D.
G[S]:
S → aSe | B
B → bBe | C
C → cCe | d

insert image description here


7.
insert image description here
Analyze as shown in the figure:
insert image description here


Guess you like

Origin blog.csdn.net/KQwangxi/article/details/123939921