Principles of Compilation: A Simple Syntax-Directed Translator

2.2 Formal description of grammar

  • A grammar includes several productions , which are called rewriting rules or rules .
  • (context-free grammar) production writing:

U : : = u or U → u

where:
U is a grammatical symbol called the left part or head
u is a finite string of grammatical symbols called the right part or body

The formal description of the grammar
Backus-Naur Form (Backus-Naur Form, BNF)
uses angle brackets to surround non-terminal symbols to mark non-terminal
symbols Terminal symbols are underlined
Symbol ::= means "derived"
Symbol | means "also can deduce"
insert image description here

context free grammar

  • It is a quadruple, including:
    terminal set,
    a set of basic symbols of the language defined by the grammar,
    appearing only on the right side of a production (body), and
    a non-terminal set
    appearing on the left side of a production There are two ways of explaining the start symbol of the grammatical symbol
    production (production) set : (1) clearly pointing out, such as G[A] ; (2) the grammatical symbol on the left of the first production rule



example
insert image description here
insert image description here

derive

  • Starting from the start symbol , continuously replace a non-terminal symbol with the right part of a production of the non-terminal symbol.
  • Derived symbols:insert image description here
  • Language: Starting from the start symbol, the set of all terminal symbol strings that can be obtained by derivation .
  • A language generated by a context-free grammar is called a context-free language

insert image description here

string

  • finite sequence of symbols
  • String representation:
    • a 2 means aa
    • a 2 b 2 means aabb
    • Closure : a* means {ε, a, aa, aaa, aaaa, ...}
    • Positive closure : a+ means {a, aa, aaa, aaaa, ...}
    • Definition A={a,b,c} ,or A *definition{ε,a,b,c,aa,ab,ac,ba,bb,bc,ca,cb,cc,aaa,...}
  • The length of the string: the number of symbols contained in the string

Sentence patterns and sentences

insert image description here

Grammar and Language

  • The relationship between grammar and language
    • Given a grammar, uniquely determine the language
    • Given a language, a grammar can be given, but the grammar is not unique
  • equivalent grammar
    • Let G1 and G2 be two grammars, if L(G1)=L(G2) , then G1 and G2 are said to be equivalent grammars.

Example 1
(1) The language corresponding to grammar A→ aAb | ab is:
L(G[A])={a n b n |n≥1}

(2) Grammar S: S→xSx|xS|y The language recognized is:
L(G[S])={x m yx n |m≥n≥0}

(3) The language recognized by grammar G: S → xxS | y is:
L(G[S])={x 2n y|n≥0}
can also be written as * L(G[S])=(xx) y

Example 2
(1) The grammar corresponding to language L={ab n a|n≥1} can be:
A → aBa B → b|bB or
A → aB B → ba|bB or...
(2) Language L={ The grammar corresponding to a 2n b|n≥1} can be:
A → Bb B → aa|aaB or
A → aab | aaA or...
(3) Language L={a m b n c k |m=n or n The grammar corresponding to =k} can be:
S → AB | DE A → aAb | ε B → cB | ε
D → aD | ε E → bEc | ε

parse tree

  • Graphically describe the derivation process
  • The composition of the syntax analysis tree:
    • The root node is the start character
    • Leaf nodes are terminators (tokens) and ε
    • Internal nodes (non-leaf nodes) are nonterminals
    • If the rule A → x 1 x 2 …x n is applied , then A is an internal node; **x 1 , x 2 , … ,x n **is a child node
      insert image description here

Comprehensive example

Consider the grammar: S → SS+ | SS* | a
1) Try to deduce the string aa+a *
(first deduce SS*)
insert image description here
2) Try to construct a parse tree for this string
3) What is the language generated by this grammar?
postfix expression

Ambiguity

Based on a grammar, if multiple parse trees generate the same terminal symbol string , the grammar is ambiguous.
This course only discusses grammatical ambiguity, not semantic ambiguity.
Semantic ambiguity example: Jack said Tom left his assignment at home.

Possibility to construct unambiguous grammars based on precedence and associativity

  • There are two priorities in the four arithmetic operations, so two non-terminal symbols expr and term can be introduced to correspond to different abstraction levels
  • The four arithmetic operations are left associative , so in each rule, the more abstract non-terminal should be on the left .
    insert image description here
    insert image description here

Guess you like

Origin blog.csdn.net/m0_71290816/article/details/129281172