Principles of Compilation-Definition and Classification of Grammar


Preface

Language is a tool used by certain groups for information communication, and the basis of information communication is to generate sentences and understand sentences in accordance with commonly agreed generation rules and understanding rules . The computer language has strict grammar and semantics, and is easy to formalize. The following content can be obtained after formal extraction of the programming language:

Programming Language (Programming Language): A collection of all statements that make up a program.
Program: A sequence of statements that meets the grammatical rules.
Sentence: A sequence of words that satisfy grammatical rules.
Token: A string that meets the lexical rules.

The descriptive form of language- grammar , has different concepts for words and sentences:

Lexical-word
composition rules of words
Description method: BNF normal form, normal form

Grammar-sentence
Composition rules of sentences
Description method: BNF paradigm, grammar (description) diagram

1. Definition of grammar

Taking the assignment statement as an example, first make the following four definitions:
non-terminal symbol set V =
{<assignment statement>, <left part quantity>, <right part expression>, <simple variable>, <subscript variable>, < Operator>}
terminal symbol set T =
{a, b, c, m[1], m[2], m[3], +, -}
grammar rule set P =
{<assignment statement> —> <left Quantities>=<right part expression>,……}
Start symbol S = <assignment statement>

According to the above definition, the formal definition of grammar G is a quaternion:

G = ( V , T , P , S ) G = (V,T,P,S) =( V , T , P , S )
V: non-terminal symbol (Variable) set
Each non-terminal symbol is called a grammatical variable (component)-representing various substructures of a language.

T: Terminal set.
Characters appearing in sentences of the language, V∩T ​​= empty set

S: Start Symbol, S∈V
represents the language defined by the grammar, and appears at least once on the left side of the production.

P: Production set.

Second, the classification of grammar

According to the complexity of the language structure (formal language) (involving the complexity of the grammar, the choice of analysis methods, and the ability to reflect the description language of the grammar), it can be divided into the following four languages:
Type 0 grammar (ie: phrase structure grammar)
Type 1 grammar (Ie: context-sensitive grammar)
Type 2 grammar (ie: context-free grammar)
Type 3 grammar (ie: regular grammar)

0. Phrase Structure Language (PSL)

If G meets the requirements of the grammar definition, then G is a type 0 grammar (PSG: Phrase Structure Grammar).

1. Context-sensitive grammar (CSG)

If for any α --> β∈P , all **|β|≥|α|** holds, then G is called a type 1 grammar. Namely: Context Sensitive Grammar (CSG-Context Sensitive Grammar)

2. Context Free Grammar (CFG)

If for any α —>β∈P , there is |β|≥|α|, and α∈V ​​holds, then G is called type 2 grammar, namely: Context Free Grammar (CFG) (CFG can describe Most grammatical components of programming languages).

3. Regular Grammar (RG)

Suppose A, B ∈ V, a ∈ T+
Right Linear Grammar: A→aB or A→a
Left Linear Grammar: A→Ba or A→a
are all 3-type grammars (Regular Grammar -RG)
The left linear grammar and the right linear grammar are equivalent, but the direction of recognizing sentences is different.
The conversion between regular grammar and regular expression .

Three, determine the category of the following grammar

G1: S —> 0 | 1 | 00 | 11 (regular grammar)
G2: S —> A | B | AA | BB, A —> 0, B —> 1 (context-free grammar)
G3: S —> 0 | 1 | 0A | 1B, A —> 0, B —> 1 (Regular grammar)
G4: S —> A | B | BC, A —> 0, B —> 1,C —> 21, C —> 11, C—> 2 (Context-free grammar)
G5: S —> 0 | 0S (Regular grammar)
G6: S —> ε | 0S (phrase structure grammar)
G7: S —> ε | 00S111 (phrase structure grammar)
G8: A —> aS | bS | cS | a | b | c (Regular Grammar)
G9: S —> 0A | 1B | 2C | 0SA | 1SB | 2SC
0A —> A0 1A —> A1
2A —> A2 0B —> B0
1B —> B1 2B —> B2
0C —> C0 1C —> C1
2C —> C2
(Context-sensitive grammar)
G10: S —> aT | bT | cT
T —> ε | a | b | c | 0 | 1 | 2 | 3 | aT | bT | cT | 0T | 1T | 2T | 3T (phrase structure grammar)

to sum up

G = (V, T, P, S) is a grammar, α→β ∈ P

  • G is type 0 grammar, L(G) is type 0 language;
  • |α|≤|β|: G is a type 1 grammar, and L(G) is a type 1 language (except S→ε);
  • α∈V: G is type 2 grammar, L(G) is type 2 language;
  • A→aB or A→a: G is a right linear grammar, L(G) is a type 3 language
    A→Ba or A→a: G is a left linear grammar, L(G) is a type 3 language

The relationship between the four grammars is defined by further restricting the production.

The level-by-level "containment" relationship between the four grammars is as follows:
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_43824348/article/details/111486412