# # Compiler principles of grammar and content (b)

Grammar and content

The second part of the compiler theory notes, the contents of reference: Soft Northern Faculty Shao Bing classroom courseware and content, Zhang Li a "compiler theory and compiler construction", Defense Industry Publishing House "compiler theory - study guide and resolve typical problems" , AlvinZH study notes and personal understanding

The current version is the inclusion of the entire contents of the follow-up will launch a Lite version and review knowledge

If an error or mistake welcome that suggestion in the comments or contact me: QQ: 847590417

Read catalog

In this chapter

2.1 form the basis of language

Non-formal discussion of grammar 2.2

2.3 grammar and language of form definitions

2.4 syntax tree and ambiguous grammar

2.5 Analysis of sentence

Practical limit of 2.6 grammars

2.7 Other grammar notation

2.8 Grammar and Language Category

Exercises within the knowledge

 

In this chapter

Key: Restrictions symbol string, compute the set of strings of symbols, ambiguous grammar, language, recursion, phrases, handles, syntax tree, grammar, grammar, BNF representation classified grammar, syntax diagram, grammar.

 

2.1 form the basis of language

A, alphabet and symbol train

Alphabet: nonempty finite set of symbols

Symbol: alphabet elements

Symbol string: spliced ​​into a finite sequence of symbols

Null symbol string: no sign of a symbol string

 

In the form of defined symbol strings:

Consider an alphabet P: 1 null symbol string is the string of symbols P; 2 if x is in the string of symbols P, and a is an element of the alphabet, the ax or XA (may be left, and right may be. , but only one can not be added) is a symbol string (symbol is a symbol string, a splice [epsilon]) on P; 3.y P is the symbol string, if and only if (iff) y is 1 and compliance 2. symbol string.

 

Second, the symbol string symbol string and a set of operation

1. The string of symbols is equal to: If x, y are two strings of symbols in the set, x = yiff the composition of each symbol of each symbol x and y are sequentially equal composition.

2. The length of the symbol string: x is a string of symbols, the length x | x | is equal to the number of symbols. (0 run length of null symbol)

3. The coupling symbol string: if x, y are defined in the symbol string P, the x and y xy is coupled on the P symbol string, a symbol string is not equal to xy YX, both in the empty string left or right coupled to a string of symbols is the same. εx = xε

4. The symbol string exponentiation: Suppose x is a string of symbols, then it is self-splicing exponentiation, 0 times equals an empty string of symbols, the n-th power of both repeat itself n times

The symbol train set product computation: Suppose A, B is a set of strings of symbols, then AB A set equal to a first set of reference symbols as in the coupling B is:

 

 

 

 

 

 

 

 

6. exponentiation symbol train set: Suppose there is a set of strings of symbols A, the zero-A is a power of only an empty string of symbols comprising a set, n is the power of n-A is the result obtained splice. N-1 power can be used in the calculation of A splicing A power n A is calculated.

7. Closure operation symbol train set: Let A be the set of strings of symbols, then:

A positive closure:

 

 

 

 

A closure:

 

 

 

 

Closure but with an empty string of symbols than the positive closure.

 

How to get a program with symbols:

A is the basic language of a string:

 

 

 

 

B is the set of words of the language:

 

 

 

 

There belong closure A, B, because it is extracted from the results of all of A

And a sentence of the language, that is a statement, is a symbol string in the B

Let C be the set of sentences of the language, then C also belong closure B, also belongs to a program C

 

Non-formal discussion of grammar 2.2

Grammar: definition and description of the structure of language, from formal language used to describe the structure and rules, also known as syntax.

Grammar rules: by establishing a set of rules to describe the grammatical structure of the sentence, in general replaced by a predetermined symbol ":: =" and "consisting of ...", the basic syntax of the following structure:

 

 

 

 

Deriving rule using the sentence: With the rule can use them in a certain way to derive or generate a sentence, as follows: Start derived from a symbol to be recognized, i.e., instead of the left and right portions of the respective rules by the rule, from derivation left to right, each time using a derivation rule.

example:

 

 

 

Until all derived nonterminal symbol to be replaced until the end.

This derivation is called leftmost derivation, in addition to this there rightmost derivation

 

 

 

 

The perception would be no good leftmost derivation.

Derived from a <Sentence> a complete derivation of the sentence can be written as:

 

 

 

 

It is seen from the above description, but the grammar definition and description of sentence structure, without involving semantic issues in form, it may eat some peanut peanuts this magic sentence.

Syntax Tree: The tree structure of a description of the syntax of the sentence:

 

 

 

2.3 grammar and language of form definitions

2.3.1 defined grammar

Definitions: grammar G = (Vn, Vt, P , Z) (grammar)

Vn: nonterminal set (nonterminal vocabulary)

Vt: End symbol set (V = Vn∪Vt, called a grammar vocabulary table) (Terminal)

P: set (Principle) or production rules

Start symbol Z (identification code) Z∈Vn

Rule: an ordered pair (U, x), x is usually written or U :: = U → x (:: = equals →), wherein the length of the U is 1, x is 0 or greater length. U∈Vn, x V belong closure.

Unsigned integer, for example, a grammar:

G [<unsigned integer>] = (Vn, Vt, P, Z)

Vn = {<unsigned integer>, <numeric string>, <number>} (the left side of the rules occur, can continue to split symbol set)

Vt = {0,1,2,3, ... 9} (remaining can not continue to split the symbol set)

P = {<unsigned integer> → <numeric string>

<Numeric string> → <numeric string> <number>

<Numeric string> → <number>

<Number> → 0

...

<Number> → 9} (variation rule may split the symbol set)

Z = <unsigned integer> (the start symbol splitter)

Generally with angle brackets enclose the nonterminal, and terminal symbols for distinguishing, in fact, must not nonterminal angle brackets.

 

Production (in the element P) on the left symbol configuration set Vn, and Z∈Vn

When the production has a left portion of the same can be taken together, with or symbol | demarcated

As both a grammar BNF representation (BNF)

 

Given a grammar, actually only a given set of production, and to specify the identification symbol to the beginning (typically the first sign convention for the left rule)

grammar:

 

:: =, |, <and> called meta-symbols (unexpanded metasymbols in 2.7 will be introduced extended metasymbols), composed of a language called meta-membered sign language, another language may be used to describe language.

 

2.3.2 Derivation defined in the form

Definitions: DERIVATION: grammar G: v = xUy, w = xuy ( zero step derivation)

Wherein x, y∈V * (x, y, or a non-terminal symbol termination or null), U belonging to Vn (nonterminal), u belonging to V *

If (U :: = u) ∈P, then v can be derived according to the grammar G w, v can be derived w, w reduction directly to v (:: = equal →)

If x = y = null symbol string, there U :: = u, U can be derived according to the grammar G or an abbreviated as U u derivable u (G may be omitted)

Example:

 

 

Definition: 1 implicitly derived: presence grammar G, there are U0, U1, ..., Un belonging positive closure V

If v = U0 U1 may be derived in accordance with the grammar G in two or more times, then G successively derived according to Un = w. (Plus sign must have is a symbol indirectly derived)

V can be derived, it said n in accordance with the grammar G w: This sequence is called the n-th derived

Example:

 

 

Definition: Indirect Derivation: presence grammar G, there are v, w n belonging to the closure V

If v is positive it can be deduced w The grammar G, or v :: = w, i.e., when v G can derive timing when w or w v of the composition: (plus n times derived directly derived)

 

Definition: canonical derivation: There can be deduced xUy xuy, if y belongs Vt closure, this specification is derived, referred to as: (rightmost requires constant and unchanging symbol string contains only the end of either symbol, either empty).

Each sentence has a canonical derivation, not every sentence has a canonical derivation, derived by standardized specifications referred to exporting sentence sentence.

With a variety of symbols derived:

 

 

 

Rightmost derivation: first deriving If there are more than two nonterminal symbol string in the right (canonical derivation); i.e. the leftmost derivation is to the left.

 

2.3.3 formal definition language

Definition: Grammar G [Z]

(1) sentence: x is a sentence <=> Z closure can derive x, and x is V * (nonterminal may have, there may be a terminator)

(2) the sentence: x is a sentence <=> Z may be derived closure is x, and x is Vt * (the smallest unit of language is a symbol string consisting of terminal symbols)

(3) Language: L (G [Z]) = {x | x∈Vt *, Z multistep deduced x}, language consists of all sentences.

 

Known language grammar can be obtained by deriving

The method is not known when formal language grammar structure, grammar and language is many

Example:

 

 

 

Definition: the same two different language syntax grammar corresponding to this, they are equivalent grammar

 

The compile-time care is actually based on strings of symbols and grammar, to determine whether the grammar symbol string corresponding to the specified language.

 

2.3.4 Recursive grammar

Endless possibilities

1. recursive rules: rule the right portion and the left portion of the same sign

Of U :: = xUy, if x is an empty string of symbols, i.e. U :: = Uy, both left recursive left part unchanged; Y is U :: = xU, a right recursive empty symbol string; if not empty xy , U :: = xUy called self-fitted.

If the grammar contains at least there is a recursive rule, the grammar is called direct recursion.

When the rules are as follows:: Indirect recursion U :: = Vx, V :: = Uy | x, U will get their own.

 

2. Recursive grammar: grammar G, exists U∈Vn

If n can be deduced ... U U ..., then G is recursive grammar (from embedded recursive), if U can be derived ... n, then G is left recursive grammar, if U ..., is a right recursive grammar.

Shortcomings left: You can not be parsed with a top-down approach, it will result in an endless loop;

The advantages of recursive grammar: Available finite rule, the definition of an infinite language. (Unsigned integer grammar is a right recursive grammar rule can be defined with 13 all unsigned integer)

 

2.3.5 sentence phrases, simple phrases and handles

Definition: phrases and simple phrases

A grammar G [Z], w is the grammar of the sentence: w = xuy (xy may be empty, u is not empty)

If the grammar can be derived xUy, (U is a nonterminal), U can derive a multi-step u, then u w relative to the U sentence is a phrase (be it in a sentence in nonterminal positions launched symbol string)

(U n belonging to the closure V can nonterminal can terminator, not empty, even though the phrase may be a symbol)

U can be derived directly if u, then u is relative simple sentence phrase U w (aka directly the phrase).

 

Definition: any one type of the leftmost simple phrase referred to the sentence of a handle, the handle is very important in the bottom-up syntax analysis.

 

Explain again:

The phrase: After the sentence into an abstract syntax tree, each leaf node with a string of symbols node successor node composed meaningful minimum unit (identified by the symbol nonterminal Release Release);

Simple phrase: the conversion of an abstract tree, child nodes can not be introduced in the symbol string other leaf nodes consisting of formulas;

Handle: leftmost simple phrase.

Phrases, simple phrases are relative to the terms of the sentence, a rectangle may have multiple phrases, simple phrase, but only one handle.

 

2.4 syntax tree and ambiguous grammar

Tree: In addition to the root node, each leaf node can have only one immediate predecessor; n-direct successor

Syntax tree: representation illustrating the structure of a sentence, is a directed graph consisting of nodes and the edges have.

A node is a symbol, the symbol is to identify the root node (the beginning), the intermediate node is a non-terminal symbol, the leaf nodes may be terminal or nonterminal symbol, there is a derivation relationship between the node to which side, generally there are pointing to the edge of default from the root of child nodes.

 

Subtree: a node in a syntax tree to a root node child syntax tree generated in the end.

Subtree and phrases: end node of a subtree by order from left to right in the sentence string of symbols, the symbol string for the sentence with respect to the sub-root of the phrase (this is because the subtree root launch).

 

The derivation and sentence generated syntax tree

Given a G [Z], sentence w. Derived sequences may be established: when Z w can be derived according to grammar G, syntax tree can be established: Z-tree root, each step of deriving a syntax tree generated.

NOTE: sentence grammar can be generated, may be derived using a different principle of derivation. Different syntax tree generated law, but the same shape of the final syntax tree generated, not all grammars have this property.

Syntax tree is derived There are three general derivation: derivation in depth; leftmost derivation: deriving first node to the leftmost end of the symbol; rightmost derivation: first rightmost.

 

1. The deduced from the syntax tree structure:

Starting from the identification code, the sequence deduced from right to left → established starting from the root node, establishing syntax tree from top to bottom.

 

 

 

2. The configuration is derived from the syntax tree

First, bottom-end node sub-tree pruning, until the whole tree cut, cut once for each time Statute: From the beginning of the sentence, since the statute gradually left to right, you can build deduced sequence, each step They are the reduction of the current sentence handle.

 

Definition: Statute of the sentence carried out in a handle called the canonical statute (left-most reduction).

Definitions: By deriving sentence specification or specifications referred to the Statute of the obtained sentence Specification

 

2.4.2 ambiguous grammar

Definition: If there are two different syntax tree for a grammar of a sentence, the grammar is ambiguous grammar, otherwise it is unambiguous grammar.

Example:

 

 

 

Their syntax tree is different:

 

 

 

Definition: If a grammar of a sentence there are two different canonical derivation, the grammar is ambiguous.

Apart from the above top-down determination ambiguous grammar may also be bottom-up view. For example, in the above example: E + E * i is the i + i * i obtained by a two specifications statute, but for the same sentence E + E * i, he has two different handles (corresponding to two different syntax tree: i and E + E). Therefore ambiguity means that the sentence syntax handle is not unique.

 

 

 

Definition: If a sentence of a grammar specification handle is not unique (there are two specifications without reduction), the grammar is ambiguous II.

Compile-time ambiguity will create uncertainty and ambiguous grammar is undecidable, it is not within a prescribed number of steps to determine whether a grammar is ambiguous. The solution is proposed restrictions, known as a sufficient condition unambiguous, when a judge can meet the grammar is unambiguous.

It can be solved in two ways according to this principle:

1. The compilation algorithm modified conditions: for example, a predetermined priority operator to avoid ambiguous grammar high look different priorities, the priority of the same predetermined direction, so that when it is derived unified.

The conditions directly modify the grammar: modified grammar rules, limiting the direct reduction sequence.

 

2.5 Analysis of sentence

When analyzing a given string of symbols S belonging Vt closure, do is a symbol string S is determined whether the corresponding language syntax.

 

Practical limit of 2.6 grammars

In a grammar, some undue rules do not appear.

Hazardous rules: for example harmful rules: U :: = U, which can cause ambiguity.

Extra rules:

(A) derivation of the sentence in the grammar, the rules of less than (the left portion of the rule nonterminal not appear in any sentence in)

(2) during the derivation of the sentence, once we used the rule, any rule not push the end of the string of symbols (not the rule contains any end of the push nonterminal symbol string). For example: When the rules on the U and only U :: = xUy, it is unnecessary, because they can not launch the end of string of symbols.

Compression grammar: grammar rules is not harmful or unnecessary rules.

Check if there are extra rules: the following two conditions need to check whether each nonterminal U each grammar rule left portion satisfies:

1. nonterminal identification code other than all, this must be present in nonterminal a sentence pattern (right)

2. nonterminal all, he must be able to derive the symbol string end (the end of symbols)

 

2.7 Other grammar notation

1. Extended BNF notation (Backus Normal Form)

BNF meta symbols: <,>, :: =, |

Expansion: <,>, :: =, |, {,}, [,], (,)

{}: M on the right side, the n, which contains a t, t denotes the symbol string to be repeated m times n, mn can be omitted, both are omitted after repeated any number of times to 0. Internal can also be used | representation or

[]: Represents the internal symbol string dispensable, equal braces wrapped symbol string m is 1, n is 0

(): Extraction symbols factor, e.g. xy | xm | xn, can be written as x (y | m | n)

 

2. The syntax diagram ( graphical)

 

 

 

 

2.8 Grammar and Language Category

Formal language: no semantic grammar and language with the described automaton

Language definition:

Grammar definition: All grammar can be defined as a four-tuple, i.e., Vn, Vt, P, Z.

 

Category and Language Grammar: Type 0, Type 1, Type 2, Type 3, that is applied to the chant their different restrictions on production.

0 Type:

Rule P: u :: = v, u belonging to V + requires nonterminal, v belongs to V *

L0, referred to as phrase structure grammar, the left and right portions are symbol string, left a string of symbols comprising at least Vn and right portions of a symbol string (which may be empty), acceptable Turing Machine.

 

Type 1:

Rule P: xUy :: = xuy, wherein U belongs to Vn, x, y, u belong V *

L1, which is called a context-sensitive grammar rules or context-sensitive, i.e., only in the x, y such contexts in order to rewrite the U u, automatically by the Turing machine received one linear boundary.

 

Type 2:

Rule P: U :: = u, wherein U belongs to Vn, u belonging V *

L2, referred to as a context-free grammar, i.e., the U u is rewritten regardless of the context, and an equivalent BNF, may be accepted by the pushdown automata.

 

Type 3:

Rule P: U :: = T or wT; U, w belongs to Vn, T is Vt.

L3, called regular grammars, regular languages, a collection of regular, approximately linear note can not both, may be accepted by the finite automaton.

 

Description: L3 belong to L1 L2 belongs belong L0, can produce a wider range of the sub-grammar grammar; e.g. type 2 grammar model may be generated grammar L2, L3 type grammar, the grammar can not produce type L1.

 

Interrelationship of the Four types of grammar and determine methods:

The main difference lies in the different composition rules character to the left and right requirements of production. Clear four kinds of grammar type from type 0 to 3, its rules and conventions more and more restrictions, more and more, so it should start with the most complex type 3 when the judge began.

Type 3 :( more restrictions: the right of a maximum of two characters, and there is only one linear)

Left only a nonterminal

The right of a maximum of two characters, two non-terminal when the left and right end, a time must end (left end and right end is non-linear, left, left end and right end is the right of non-linear)

Linear about not both be present.

 

Type 2 :( more restrictions: the left need only one non-terminal)

Left only a nonterminal

On the right there is a number of terminal and non-terminal symbol

 

:( more than 1 type of restriction: the left need to have non-terminal)

At least the left contains a nonterminal

On the right there is a number of terminal and non-terminal symbol

The left and right sides of deriving the changes in the content can not be changed to the right to the left, the change can not be empty

 

0 Type:

Symbol string having at least a left nonterminal

Right side free to

I.e., as long as it is described, it belongs to Type 0 grammar

 

Exercises within the knowledge

The syntax of the language is a set of rules for forming a legitimate program, some of these rules is the lexical rules, the other part is the syntax rules (also called production rules).

Name and identifier: the identifier names are composed of identifiers in different languages ​​specification is not the same. They are difficult to distinguish in form, the identifier is a string that does not make sense, the name is a clear meaning and properties, and the name can be regarded as representative of an abstract storage unit.

A plurality of languages ​​may be deduced grammar

A grammar and a language can only be deduced. Language grammar: one to many

For different types of grammar, their rule is for all production work, so only need to find a not conform to the rules, then he does not belong to this type of grammar

When asked what a grammar of the language is, if humans can understand, such as an unsigned integer, a string of 0-5 digits. If such statements can not, you need to use a collection to represent all elements of the collection of the sentence is derived in accordance with the rules of grammar.

Nonterminal <> are used only to differentiate, the increase is not necessary, other forms may be distinguished, e.g. nonterminal uppercase, lowercase nonterminals.

Some questions are more prone to: obtain the corresponding language according to the grammar, in accordance with a grammar description language, grammar type determination, the subsequent update will give some impression techniques

 

Guess you like

Origin www.cnblogs.com/doUlikewyx/p/11524708.html