(C) The second part of the principle # # compiler lexical analysis

lexical analysis

The third part notes compiler theory, because the content is too long so divided into two parts, the total jumps link to read the table of contents, the contents of reference: Soft Northern Faculty Shao Bing classroom courseware and content, Zhang Li a "compiler theory and compiler construction "National defense industry Press the" compiler theory - study guide and resolve typical problems, " AlvinZH study notes and personal understanding

The current version is the inclusion of the entire contents of the follow-up will launch a Lite version and review knowledge

If an error or mistake welcome that suggestion in the comments or contact me: QQ: 847590417

Total Read Index

The total content of this chapter

first part:

3.1 functions and implementation of lexical analysis program

3.2 output form of the word types and lexical analysis procedures

3.3 regular grammar and state of FIG.

3.4 Regular expressions and finite automata FA

the second part:

3.5 finite automata, regular grammar, regular expressions transformation

0. regular rotation state grammar G in FIG.

1.DFA M is positive grammar G

2. regular grammar G translocation DFA M

3. Regular expressions turn DFA M

4.DFA M positive expression

The regular expression grammar G positive

6. The regular expression grammar G is positive

3.6 Design and Implementation of lexical analysis program

3.6.1 lexical analysis principles

Construction 3.6.2 lexical analysis program

3.6.3 achieve lexical analysis program

Automatically generating the lexical analyzer 3.7 LEX

3.7.1 LEX base notes

3.7.2 LEX implementation

Topics explain

 

The total content of this chapter

Focus: Transformation Transformation lexical analysis reports, lexical analysis of words the kind of division, regular grammar, state diagrams, regular expressions, automata, automaton, expression grammar and automata, design lexical analysis program implementation, automatic lexical analyzer Builder LEX.

 

Before content

Lexical analysis reports, lexical analysis is divided kind word, a regular grammar transformation, state diagrams, regular expressions, automata, automaton will be in the first part of the third chapter introduced.

 

3.5 finite automata, regular grammar, regular expressions transformation

Conversion Flowchart:

The following sequence is converted by the arrow in FIG sorting order (NFA comprises DFA, the NFA may be so called conversion and conversion of DFA).

 

0. regular rotation state grammar G in FIG.

State drawing left linear grammar (state diagrams only for the left linear grammar, which is behind the DFA and the significant difference) Draw the state diagram is not critical (right-linear temporarily not considered)

1. grammar nonterminals is one node

2. set a start condition S (sentence)

3. Rule Q :: = t (t is the terminator), to a required one from the arc Q is S, the arc labeled t

4. indicia on Q :: = Rt, draw an arc from R to Q, the arc is t

(Inverted, who statute to who, who who point)

The automatic machine method, adding a start state and end state flag for termination state identification symbol, double circled

1.DFA M is positive grammar G

rule:

1. (A, t) = B, written as: A → tB (linear only to push the right and left linear recursion may deriving)

2. For each acceptable state Z (terminated state), increasing the production Z → ε

3. There initial state corresponding to the start symbol of grammar finite automata, the finite automaton alphabet set of terminal symbols of the grammar

Example:

 

 

2. regular grammar G translocation DFA M

And the state conversion rules :( FIG similar)

1. nonterminals same alphabet (all the symbols on the arcs of the table) and G

2. Generate a state of M G each nonterminal, G major start symbol S is the start state S

3. Add a new state Z, as the final state of the NFA

4. G is shaped as A → tB, wherein t is a non-terminal symbols terminator or null character, A and B production, configured of a transfer function M (A, t) = B

4. production of G in the form A → t, construction of a transfer function M (A, t) = Z

Example:

 

 

3. Regular expressions turn DFA M

They are equivalent

Theorem: a word in the set V [Sigma, V is a subset Σ *, then the set is positive if and only if there exists a DFA M so that V = L (M).

rule:

A regular expression, from left to right teardown analysis to build

a. of the process is not an empty set φ

b. of the regular expression ε, emitted by the x symbol to null symbol arc y

c. in the presence of alphabet letter symbols such as regular expression a, emitted by the x symbols for the character to an arc y

 

(X, y state, the initial state is only temporary build final states, i.e. the symbol is a regular expression of characters in the read (from left to right decomposition))

 

A plurality of regular expression, such as s, t, and their NFA for the Ns Nt

a. R=s|t

 

b. R=st

 

c. R=s*

 

d. As R = (s), the NFA and R = S

 

Example:

 

1. From the NFA began building

 

 

 

2. From the outside start building

 

 

4.DFA M positive expression

rule:

(1) on M by two nodes x, y. X is connected with a null symbol from the arc to the initial state for all M nodes, the connection from the final state for all M nodes with a null symbol to the arc y, and M formed the equivalent M ', only one initial state at this time a final state.

Other nodes (2) Elimination of M '(except for x, y)

1. o Merge

 

2. and change or

 

3. recursive bordered by an asterisk

 

 

That NFA regular expression turn upside down

 

Example:

 

 

 

 

The regular expression grammar G positive

Three rules, may be converted to a regular grammar production started only a symbol, and the right side excluding nonterminal, containing only the corresponding expression. Generating applications expansion after conversion BNF notation, while the identifier should be used when a good repeats 0 ~ n * instead of

(1) is substituted into the rule: The A → xB, B → y into A → xy

(2) elimination of recursive rules: The A → xA | y into A → x * y

(3) BNF rule: The A → x, A → y into A → x | y

NOTE: linear left, then to A → Ax | y into A → yx *

E.g:

 

Example:

 

 

6. The regular expression grammar G is positive

Rules are as follows:

(1) for any regular expression r, select a nonterminal symbol S as the identification, production and generates S → r

(2) if x, y is a regular expression:

1. A → xy, converted to A → xB, B → y, B a new nonterminal

2. A → x * y, into A → xA, A → y (Note: for A → x * y, it must be transformed into A → xA, A → ε)

Production of y | of 3. A → x

E.g:

 

Example:

 

Linear, then left to die :( cycle)

 

3.6 Design and Implementation of lexical analysis program

3.6.1 lexical analysis principles

Description:

1. For footnotes output symbols are not

2. between each word separated by whitespace (space, tab, carriage return)

 

After learning of grammar

The process required to convert all of the grammar nonterminals drawn to (the end of each symbol is the initial symbol)

 

Other characters appear here, actually any character, for example, read and then read + + + after a former relative to the other is also a character.

 

These conversion processes are then combined, as the initial state of the incoming symbol string. The combined also noted: repeated symbols special treatment (single or double character delimiters combined treatment), but also an error state (does not belong to any symbol string of a process).

 

Construction 3.6.2 lexical analysis program

The practice of different states

Start state: using program sequentially reads the character, null character read is skipped, and then processed for each non-empty string to program.

Identifier state: after combined into the identifier, judgment is reserved word or a user-defined

Integer Status: After composing a digital numeric characters to be done to convert the binary value

Single character delimiters state: determining a category corresponding to the coding

Colon Status: requires binding and judge the next character is a single character or a character bis

Status vertical ramp: Analyzing following character also requires, as a character or skip Notes

Error Status: print an error message and skip

 

Note: When the lexical analysis in order to determine whether or not already read the sign right word, sometimes need to read ahead a character, such as an identifier and unsigned integers and other states. This is to prevent skip a character should not be skipped. Therefore, before returning to the calling program should read a character pointer back one character. (Character pointer is actually retreated back before a character as possible to read a character when reading the characters, resulting in a read back of this character is ignored, it is necessary to back (character pointers have been forward and backward is up a reading of the characters spit a))

 

3.6.3 achieve lexical analysis program

A lexical analysis procedures require: 1 and word internal representation of the public (global) variables and procedures 2. lexical analysis procedures need to reference 3. lexical analysis program algorithm

1. Output form: i.e. by the predetermined word and the internal representation is performed (typically binary type, a category is coded, a corresponding word value)

2. The global variables and procedures (ie, a lexical analysis procedures need to reference variables and procedures, generally defined in advance you want to use, you can call when needed)

 

 

3. lexical analysis program algorithm

In fact, the specific structure algorithm program decided by the developer, for example, whether a fallback character stream, how to determine the type, etc., are carried out is determined by the specific implementation.

 

The complete state diagram before algorithm can be configured as

Fake code:

 

When the lexical analysis program as a subroutine, called by the parser general, when a combination of a lexical analysis program is returned to the word parsing statement, and returns when the class code word into the variable unit symbol. (Parser will be provided with variable class, the class codes for storing word)

 

Automatically generating the lexical analyzer 3.7 LEX

3.7.1 LEX base notes

Function: LEX input source can be generated after the LEX lexical analysis procedure L

Then enter the string through L SP SP can be a word string output

 

Mainly consists of three parts:

1. The rules define the type, identification rules to use to define regular expressions name

2. identification rules, using regular expressions and definitions are given (e.g. snippet to be traveling straight) in the next step of the behavior of the recognized word

3. The user subroutine, given other operations required by the user

Required between the parts separated by %%

 

Rule definition formula: LEX statement of the form

, D is the name of a regular expression, simple name; R is a regular expression

E.g:

 

 

 

 

Identification rules: a string of the form of the statement LEX

P is defined Σ∪ {D1, D2, D3 .....} regular expressions, the conjugations

A is a sequence of statements, recognized word form refers to, but thereafter, the lexical analyzer should do the operation, i.e., the basic operation and returned word encoding the category value P words.

 

A complete LEX source:

 

Tip: regex {} + represents at least one repetitions

 

3.7.2 LEX implementation

LEX LEX is a function of a configuration of this source program analysis method, the lexical analyzer is essentially a finite automaton.

LEX generated lexical analysis procedure consists of two parts: DFA state transition matrix and a control program execution. I.e., there is a function LEX transition matrix and in accordance with a control program source LEX generation state.

LEX of the process:

 

NFA (empty symbol, often succeeding), DFA must be the NFA

Converted to DFA, we each new recognized words terminating state types, depending on the termination status of a subset of the original comprising the NFA may be, containing only one, it is that the state is terminated other words, if a plurality, then or a need to add.

 

1. Each scan configuration recognition rule P a corresponding non-deterministic finite automaton M

2. Each rule Finite Automata Mi merged into a new NFA M

3.NFA determine into DFA

4. Generate the DFA state transition matrix and a control program executed

 

LEX principle of ambiguity, two principles

For example, begin a keyword or identifier

1. the longest match

In a word recognition process, a character string according to the longest match rule, it is to be recognized as a word line with Pk rule rather than the smaller range of:

 

2. optimal matching principle

If a character string, there are two rules match, then the match rules by the rule in front of the sequence, i.e., arranged in front of the high priority rules.

 

LEX example:

 

1. The individual results NFA

 

2. combined into a NFA

 

3. Determine of

 

4. Finally, the state transition matrix and the write control program to

 

Analysis process:

 

LEX is constructed of a general-purpose tool, it can generate a parser various languages, just depending on the different languages ​​written LEX source file on it.

LEX not only can automatically generate its lexical analysis, but also may produce multiple pattern recognizer and text editor.

 

Topics explain

In describing derivation is based on ... If you have, you do not need to say a "availability" of the

Note judgment when do question all terminal and non-terminal symbol, see title

Analyzing the phrase: the type of a syntax tree, for any junction node U, the root node if the subtree its height is not 0, this sub-leaf nodes of the tree connected to all the obtained string u, u is the relative U phrases for that sentence, that is finally, as long as f → p, and p no extends even after the tree, then p itself is a phrase, and a simple phrase.

When drawing the string automaton specifically described, may be written directly syntax random enough, you can assist a state of FIG.

The set acquired during a subsequent set of characters corresponding to the following steps: traversing each state set, the state reaches each state acquired by the arc length of any symbol that symbol and the state from the state after the arrival of any arc length ε ( that does not include the departure of the state, unless it can be reached by both methods). After traversing aggregated to get state to obtain new collection.

Linear left: The right nonterminal position may be, it is left on the left.

FIG approximately linear grammar generation state: subsequent supplements.

 

Description is given regular expression, attention to several points: holding random, construct an expression segment, the true meaning of attention symbol (* is repeated, and from 0 to infinity, and if not only repeat a bracket, For example (11) * is an even number of repeats).

Expression construct FA: From the outside start from the left, and * the turn, two ε is not required, when to use? ?

Since the determination of when a collection type, so attention to see

Analyzing NFA: two successor same symbols, there arc ε

Five yuan FA: the entire state of all symbols, transformation matrix, start state (non-empty set of start), the terminated state set. Note that if the state is transformed {}, is more subsequent.

Guess you like

Origin www.cnblogs.com/doUlikewyx/p/11627458.html