Compiler - Context Free Grammar

context-free grammars context-free grammar
This is used for parsing (syntactic analysis)

The language that can be recognized by finite automata is called regular language
context-free language, including regualr language

For example, the following languages ​​are context-free but not regular 

L = \left \{ a^nb^n | n\geq 0\right \}

S\rightarrow aSb | \epsilon

 G = (V, T, P ,S)

V : set of variables(non-terminals)
T : set of terminals
P : set of productions
S : start varible

Note that V, T, P, and S here only represent the four types in the grammar. 
For example, the following three-line grammar  S1 needs to be replaced later, so the b in the third line of the V non-terminal variable does not need to be replaced. Subsequent replacements belong to the T termination variable. There are three lines in total, so there are three P output results. S is the starting position, which is the S in the above picture  ⚠️: The letters in the syntax are not fixed and any letter can be used 
\\S \rightarrow S1S2 \\S1 \rightarrow aS1b | \epsilon \\S2 \rightarrow bS2 | b




Derivation Tree for aaabbb

 Example 2

L2 = \left \{a^nb^m | n\geq 0,m> n \right \}

 aabbbb
The first way of thinking can be to regard aabb as a whole and add some bb behind it
\\S \rightarrow S1S2 \\S1 \rightarrow aS1b | \epsilon \\S2 \rightarrow bS2 | b

aabbbb

The second way of thinking is to regard bb as the center of the whole string. If the number of a and b is equal, it is ab. But when the number of b is greater than a, the central part must only contain b.

\\S \rightarrow aSb | B \\B \rightarrow bB | b

context-free: context-free
context-sensitive: context
-sensitive Then the difference between context-free and context-sensitive:
there is only one variable on the left of context-free, which is S  \rightarrow aSbbbb, and context-sensitive is aSb \rightarrow aSbbbb, which has variables on the left and terminal

∑ = {ident,num,if,else,+,-,……}

Generally speaking, in the stage of lexical analysis, we will use a string as input to the scanner, and then the scanner will judge whether it is a variable, a number, a special character, an assignment symbol, etc... In computer high-level languages, we use the variable name (
identifier ) This class is a symbol in the computer language table. In the process of scanner (lexical analysis), it undoubtedly passes in a string and returns a category. Combine various categories together to form a context-free language and then use a specific grammar Grammar definition to judge whether this bunch of category combinations conform to the specified grammar. If it does not comply, a compilation error will occur.

x12 = abc + def ;
look at the category here is ident = ident + ident ; that is, ident = Expr ;

Assign → ident = Expr ;
Expr → ident + ident 

These two lines of syntax can support the above line of code but are not universal and only limited to the above code

//We found that the above code can only do addition and this Expr only supports two idents, so we modify it to make it more general
 Assign → ident = Expr ;
Expr → Expr Op ident|ident

Op → + |- | * | /

Let's use an example to demonstrate the parsing tree (syntactic analysis tree)
⚠️: parsing tree will be converted into AST by us later: abstract syntax tree (abstract syntax tree) and then AST will be converted into IR: intermediate representation
x = a + b *c;

 However, we found that there is still a problem. At present, according to this grammar, the addition operation is performed first and then the multiplication operation. This is obviously not logical according to the mathematical rules, and the grammar needs to be improved.

x = a + b * c;
in mathematics we call a or b*c a term, and b or c in b*c is called a factor

Assign → ident = Expr ;
Expr → Expr + Term | Expr - Term | Term

Term → Term * Factor | Term / Factor | Factor

Factor → ident | whether

 

 The parsing tree starts from the bottom and goes up. You can see that the multiplication * is below the plus sign + 

But what about x = (a + b) * c - d;?
At this time in mathematics (a + b) is a factor c is also a factor (a + b) * c is a term
to form the ultimate version

Classic Expression Grammar :

Assign → ident = Expr ;
Expr → Expr + Term | Expr - Term | Term

Term → Term * Factor | Term / Factor | Factor

Factor → ident | whether ( Expr )

 

Guess you like

Origin blog.csdn.net/weixin_43754049/article/details/126245606