Compilation principle_P1002

. lexical analysis

1.1 Lexical notation and attributes

Lexical tokens, patterns, lexical units

Enumeration of token noun units Informal description of patterns

if         if            字符i,f

for        for              字符f,o,r

relation <,<=,=,... < or <= or = or...

id sum, count, D5 alphanumeric string starting with a letter

number 3.1, 10, 2.8 E12 any numerical constant

literal "seg.error" any string between quotes "and" without quotes itself

 

Some Problems in Lexical Definitions in History

---- Ignore the difficulty caused by spaces

  DO8I = 3.75 is equivalent to  DO8I =3.75

  DO 8 I = 3,75

---- keywords are not reserved

  IF THEN THEN THEN=ELSE; ELSE ...

---- The difference between keywords, reserved words and standard identifiers

  Reserved words are lexical units with a predetermined meaning in a language

  A standard identifier is also an identifier with a predetermined meaning, but a program can redeclare its meaning

 

Attributes of Lexical Tokens

Notation and attribute value for position = initial + rate * 60:

<id, pointer to position entry in symbol table>

<assign_op>

<id, pointer to initial entry in symbol table>

<add_op>

<id, pointer to rate entry in symbol table>

<mul_op>

 <number, integer value 60>

 

lexical error

---- The lexer takes a very local view of the source program

---- Example: Difficult to find the following errors

  fi (a == f(x))....

---- When the real number is written in the format of "number string.number string", the following errors can be found

  123.x

---- Error recovery in emergency mode

  Delete the current number of characters until the correct token can be read

---- bug fixes

  Attempts to add, delete, replace, and swap characters

 

1.2 Description and recognition of lexical tokens

Strings and Languages

---- alphabet: a finite set of symbols, eg: Σ = {0,1}

---- string: a finite sequence of symbols, for example: 0100, ε

 ---- language: a set of strings on the alphabet

  {ε , 0,00,000 , ...} , {ε} , Φ

 ---- Sentence: A string belonging to a language

String operations

---- join (product) xy, se = es = s

---- Power s 0 is e, s i is s i-1 s (i>0)

operations of language

---- Union operation: L∪M = {s|s∈L or s∈M}

---- Connection: LM={st|s∈L and t∈M}

---- Exponentiation: L 0 is {e}, L i is L i-1 L

---- Closure: L*=L 0 ∪L 1 ∪L 2 ∪...

---- Positive closure: L + =L 1 ∪L 2 ∪...

Formal Form:

  Regular expressions are used to represent simple languages ​​and are called regular sets.

Language Notes for Regular Form Definitions

and and}

a          {a}            a∈Σ

(r)|(s) L(r)∪L(s) r and s are normal expressions

(r)(s) L(r)∪L(s) r and s are normal expressions

(r)* (L(r))* r is the normal form

(r) L(r) r is the normal form

((a)(b)*)|(c) can be written as ab*|c, which can be written in this form if the priority is defined.

Other examples:

---- a|b    {a,b}

----- (a | b) (a | b) {aa , ab , ba , bb}

----- aa | ab | ba | bb {aa , ab , ba , bb}

---- a* The set of all strings formed by the letter a

---- (a|b)* All sets of strings consisting of a and b

* means repeated several times

 

formal definition

---- Naming the regular form to make the presentation brief

? ---Can have or not, + means optional.

ws, white space, several blank characters

transformation diagram

 

 1.3 Finite Automata

Uncertain Finite Automata NFA

 

1.4 Deterministic Finite Automata DFA

Change from NFA to DFA

An algorithm is shown as follows:

example:

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325070426&siteId=291194637