2.2 Formal description of grammar
- A grammar includes several productions , which are called rewriting rules or rules .
- (context-free grammar) production writing:
U : : = u or U → u
where:
U is a grammatical symbol called the left part or head
u is a finite string of grammatical symbols called the right part or body
The formal description of the grammar
Backus-Naur Form (Backus-Naur Form, BNF)
uses angle brackets to surround non-terminal symbols to mark non-terminal
symbols Terminal symbols are underlined
Symbol ::= means "derived"
Symbol | means "also can deduce"
context free grammar
- It is a quadruple, including:
terminal set,
a set of basic symbols of the language defined by the grammar,
appearing only on the right side of a production (body), and
a non-terminal set
appearing on the left side of a production There are two ways of explaining the start symbol of the grammatical symbol
production (production) set : (1) clearly pointing out, such as G[A] ; (2) the grammatical symbol on the left of the first production rule
example
derive
- Starting from the start symbol , continuously replace a non-terminal symbol with the right part of a production of the non-terminal symbol.
- Derived symbols:
- Language: Starting from the start symbol, the set of all terminal symbol strings that can be obtained by derivation .
- A language generated by a context-free grammar is called a context-free language
string
- finite sequence of symbols
- String representation:
- a 2 means aa
- a 2 b 2 means aabb
- Closure : a* means {ε, a, aa, aaa, aaaa, ...}
- Positive closure : a+ means {a, aa, aaa, aaaa, ...}
- Definition A={a,b,c} ,or A *definition{ε,a,b,c,aa,ab,ac,ba,bb,bc,ca,cb,cc,aaa,...}
- The length of the string: the number of symbols contained in the string
Sentence patterns and sentences
Grammar and Language
- The relationship between grammar and language
- Given a grammar, uniquely determine the language
- Given a language, a grammar can be given, but the grammar is not unique
- equivalent grammar
- Let G1 and G2 be two grammars, if L(G1)=L(G2) , then G1 and G2 are said to be equivalent grammars.
Example 1
(1) The language corresponding to grammar A→ aAb | ab is:
L(G[A])={a n b n |n≥1}
(2) Grammar S: S→xSx|xS|y The language recognized is:
L(G[S])={x m yx n |m≥n≥0}
(3) The language recognized by grammar G: S → xxS | y is:
L(G[S])={x 2n y|n≥0}
can also be written as * L(G[S])=(xx) y
Example 2
(1) The grammar corresponding to language L={ab n a|n≥1} can be:
A → aBa B → b|bB or
A → aB B → ba|bB or...
(2) Language L={ The grammar corresponding to a 2n b|n≥1} can be:
A → Bb B → aa|aaB or
A → aab | aaA or...
(3) Language L={a m b n c k |m=n or n The grammar corresponding to =k} can be:
S → AB | DE A → aAb | ε B → cB | ε
D → aD | ε E → bEc | ε
parse tree
- Graphically describe the derivation process
- The composition of the syntax analysis tree:
- The root node is the start character
- Leaf nodes are terminators (tokens) and ε
- Internal nodes (non-leaf nodes) are nonterminals
- If the rule A → x 1 x 2 …x n is applied , then A is an internal node; **x 1 , x 2 , … ,x n **is a child node
Comprehensive example
Consider the grammar: S → SS+ | SS* | a
1) Try to deduce the string aa+a *
(first deduce SS*)
2) Try to construct a parse tree for this string
3) What is the language generated by this grammar?
postfix expression
Ambiguity
Based on a grammar, if multiple parse trees generate the same terminal symbol string , the grammar is ambiguous.
This course only discusses grammatical ambiguity, not semantic ambiguity.
Semantic ambiguity example: Jack said Tom left his assignment at home.
Possibility to construct unambiguous grammars based on precedence and associativity
- There are two priorities in the four arithmetic operations, so two non-terminal symbols expr and term can be introduced to correspond to different abstraction levels
- The four arithmetic operations are left associative , so in each rule, the more abstract non-terminal should be on the left .