Compilation principle-top to bottom method to eliminate ambiguity and left recursion


Preface

Language is to meet certain composition rules of a collection of sentences, sentences that meet certain composition rules sequence of words, word is to meet certain composition rules string. These composition rules are the productions in the grammar.
Syntax analysis is the core part of the compiler. Its task is to check whether the word sequence output by the lexical analyzer is a sentence in the source language, that is, whether it conforms to the grammatical rules of the source language. Whether it is top-down or bottom-up, the grammar analyzer scans the input word sequence from left to right, reads one word at a time, and builds a grammatical analysis tree for the input word sequence.

1. Top-to-following method

The basic idea of top-down grammatical analysis :
Starting from the beginning symbol of the grammar, seek a leftmost derivation of the given input symbol string. That is, starting from the root S, construct a syntax tree of the given input symbol string.

Example: with

G: S → x A y, A → ∗ ∗ ∣ ∗, input string: x ∗ ∗ y G: S→xAy, A→**|*, input string: x ** y GSx A y ,A, Input the string : Xy
is analyzed as:
S → x A y → x ∗ ∗ y S→xAy→x**ySx A yx∗ The y
syntax tree is:Syntax tree


2. Top-down problems

1. Ambiguity

1.1 Ambiguity definition : For grammar G, if there is a sentence with two or more parse trees in L(G), then G is said to be ambiguous. It can also be equivalently said: if there is a sentence with two or more leftmost (or rightmost) derivations in L(G), then G is an ambiguous grammar.

1.2 Background of the ambiguity problem : Assuming that w ∈ L(G) and there are two left-most derivations of w, when performing a top-down grammatical analysis of w, the grammatical analysis program will not be able to determine which of w is the most Derive left. Then the grammar has an ambiguity problem.

1.3 Hypothesis Problem Example :
Hypothesis Grammar G:
G: E → id ∣ c ∣ E + E ∣ E – E ∣ E ∗ E ∣ E / E ∣ E ∗ ∗ E ∣ (E) G: E → id | c | E + E | E – E | E * E | E / E | E ** E | (E)G:EidcE+EEEEEE/EEE ( E )
According to grammar G, analyze the following sentences:

i d 1 + c ∗ i d 2 id1 + c * id2 i d 1+ci d 2
will produce two different syntax trees:
syntax tree one:Insert sodas picture description heresyntax tree two:
Insert picture description here

2. Solutions to ambiguity problems

Solution 1: Modified Grammar, Incorporation New Grammar Variable
General Grammar G G: E → id ∣ c ∣ E + E ∣ E – E ∣ E ∗ E ∣ E / E ∣ E ∗ ∗ E ∣ (E) G: E → id | c | E + E | E – E | E * E | E / E | E ** E | (E)G:EidcE+EEEEEE/EEE ( E ) is
transformed into:

 G:    E → E+T | E-T| T
		T → T*F | T/F | F
		F → F↑P | P 
		P → c | id | (E)  

Solution 2: According to the priority relationship, ensure the principle of priority of high-priority operators.

3. Left recursion problem

3.1 Definition of left recursion : if there is a derivation A -> αAβ, then the grammar G is said to be recursive, and when α=ε, it is called left recursion.
Supplement: If A —> αAβ requires at least two derivations, then grammar G is called indirect recursion. When α=ε, it is called indirect left recursion; if there is a production formula of the form A —> αAβ in grammar G, then The grammar G is called direct recursion. When α=ε, it is called direct left recursion.

3.2 The background of the left recursion problem : When the first non-terminal symbol on the right part of the production is the non-terminal symbol on the left part of the production, the left recursion problem will arise.

3.3 Examples of direct left recursion problems :
Example: Infinite derivation problems caused by left recursion:

  Ger:	E→E+T   
		E→T   
		T→F   
		T→T*F    
		F→(E)   
		F→id 

Consider creating a left-most derivation for the input string id+id*id:
when the left-most derivation is established, when the derivation reaches E or T, left recursion is generated, which leads to infinite derivation.

3.4 Example of
indirect left recursion : an example of indirect left recursion:

     S → Ac | c
     A → Bb | b
     B → Sa | a

4. The solution to the left recursion problem

4.1 Direct left recursive solution :

Step 1: Direct left recursion elimination (convert to right recursion)
Step 2: Introduce a new variable A', replace the left recursive production A→Aα|β with A→βA' A'→αA' |ε
above The example is solved according to the solution:

Insert picture description here
4.2 Indirect left recursive solution :

The basic idea of ​​eliminating indirect left recursion : 1. Number the grammatical variables; 2. Then use the bring-in method to turn indirect left recursion into direct left recursion; 3. Then use the above method to eliminate direct left recursion.
step:

1.将G的所有语法变量排序(编号),假设排序后的语法变量记为A1,A2,…,An;
2for i←1 to n {
    
    
3for j←1 to i-1 {
    
    
4.        对每个形如Ai→Ajβ的产生式,其中,Aj→α1|α2||αk是
           所有当前Aj产生式,用产生式Ai→α1β|α2β||αkβ替换
5}
6.     消除Ai产生式中的所有直接左递归
7} 

Pseudo-code analysis: When
i=1, the loop body of lines 3-5 is not executed. At this time, the elimination left recursion operation on line 6 is executed to eliminate the direct left recursion of all A1 variables.
At this time, all A1 has A1→ A production in the form of Abα must have b>1.

Looking at i=2, the leftmost element on the right of A2 is a variable. The production form may be A2→A1α | A2β | A3γ|... It is
known that the current leftmost element on the right of A1 is a variable. The production form is A1→A2α' | A3β'|……
After the replacement operation in line 4, all productions of A2 have the form of A2→Abα, and b>=2 must be found.
After executing the elimination direct left recursion in line 6, all productions of A2 with the form A2→Abα must have b>2.

i=3, the leftmost element on the right of A3 is a variable. The production form may be A3→A1α | A2β | A3γ|... It is
known that the current leftmost element on the right of A1 is a variable. The production form is A1→ A2α' | A3β'|……,
the leftmost element on the right of the current A2 is a variable. The production form is A2→A3α' | A4β'|……
After the replacement operation in line 4, all of A3 has A3 A production in the form of Abα must have b>=3.
After executing the elimination direct left recursion in line 6, all productions of the form A3Abα must have b>3.

By analogy, i=n, ​​after the replacement operation in line 4, all productions of An have the form of An→Abα, and b>=n must be.
After executing the elimination direct left recursion in line 6, all productions of An with the form of An→Abα must have b>n. In other words, left recursion has been eliminated.

The detailed explanation of the elimination steps of the above indirect left recursion example is:

     S → Ac | c
     A → Bb | b
     B → Sa | a

Syntax variable ordering: when B, A, S
i = 1, B → Sa | a has no direct left recursion; when
i = 2, A → Bb | b, get after substituting variable B, A→Sab|ab|b;
i = 3, S→Ac|c, replace variable A to get S→Sabc|abc|bc|c;
then perform the elimination direct left recursion operation.

5. Backtracking issues

5.1 backtracking problem definition : the grammar productions A right portion of each syntax referred A variable of the candidate type . If there are multiple candidates for A with a common prefix , the top-down parser will not be able to accurately select the production used for derivation according to the current input symbols, and can only test. When the trial is unsuccessful, you need to go back to the previous step of derivation to see if there are other candidates for A. This is backtracking.

5.2 Examples of backtracking problems :
There is the following grammar G:

Ge:	E→T   
    	E→E+T   
		E→E-T   
 		T→F   
 		T→T*F   
		T→T/F  
		F→(E)   
		F→id 

Consider establishing the leftmost derivation for the input string id+id*id:
In the derivation process, because the top-down grammatical analysis is used, it is judged that E adopts E→ET. If it is not possible, then it needs to go back until E is found. →E+T production.

6. Backtracking problem solutions

6.1 The solution to the backtracking problem uses the method of extracting the left factor to transform the grammar in order to reduce the occurrence of backtracking in the derivation process. Of course, simply extracting the left factor cannot completely avoid the backtracking phenomenon.

Insert picture description here


to sum up

There are three grammatical requirements for top-down analysis:

  1. Unambiguous
  2. No left recursion;
  3. The first terminal deduced by each candidate of any grammatical variable A must be different.
  4. Link: Principles of Compilation-Definition and Classification of Grammar .
  5. Link: Compilation principle-conversion between regular grammar and regular expression .

Guess you like

Origin blog.csdn.net/weixin_43824348/article/details/111590542