. lexical analysis
1.1 Lexical notation and attributes
Lexical tokens, patterns, lexical units
Enumeration of token noun units Informal description of patterns
if if 字符i,f
for for 字符f,o,r
relation <,<=,=,... < or <= or = or...
id sum, count, D5 alphanumeric string starting with a letter
number 3.1, 10, 2.8 E12 any numerical constant
literal "seg.error" any string between quotes "and" without quotes itself
Some Problems in Lexical Definitions in History
---- Ignore the difficulty caused by spaces
DO8I = 3.75 is equivalent to DO8I =3.75
DO 8 I = 3,75
---- keywords are not reserved
IF THEN THEN THEN=ELSE; ELSE ...
---- The difference between keywords, reserved words and standard identifiers
Reserved words are lexical units with a predetermined meaning in a language
A standard identifier is also an identifier with a predetermined meaning, but a program can redeclare its meaning
Attributes of Lexical Tokens
Notation and attribute value for position = initial + rate * 60:
<id, pointer to position entry in symbol table>
<assign_op>
<id, pointer to initial entry in symbol table>
<add_op>
<id, pointer to rate entry in symbol table>
<mul_op>
<number, integer value 60>
lexical error
---- The lexer takes a very local view of the source program
---- Example: Difficult to find the following errors
fi (a == f(x))....
---- When the real number is written in the format of "number string.number string", the following errors can be found
123.x
---- Error recovery in emergency mode
Delete the current number of characters until the correct token can be read
---- bug fixes
Attempts to add, delete, replace, and swap characters
1.2 Description and recognition of lexical tokens
Strings and Languages
---- alphabet: a finite set of symbols, eg: Σ = {0,1}
---- string: a finite sequence of symbols, for example: 0100, ε
---- language: a set of strings on the alphabet
{ε , 0,00,000 , ...} , {ε} , Φ
---- Sentence: A string belonging to a language
String operations
---- join (product) xy, se = es = s
---- Power s 0 is e, s i is s i-1 s (i>0)
operations of language
---- Union operation: L∪M = {s|s∈L or s∈M}
---- Connection: LM={st|s∈L and t∈M}
---- Exponentiation: L 0 is {e}, L i is L i-1 L
---- Closure: L*=L 0 ∪L 1 ∪L 2 ∪...
---- Positive closure: L + =L 1 ∪L 2 ∪...
Formal Form:
Regular expressions are used to represent simple languages and are called regular sets.
Language Notes for Regular Form Definitions
and and}
a {a} a∈Σ
(r)|(s) L(r)∪L(s) r and s are normal expressions
(r)(s) L(r)∪L(s) r and s are normal expressions
(r)* (L(r))* r is the normal form
(r) L(r) r is the normal form
((a)(b)*)|(c) can be written as ab*|c, which can be written in this form if the priority is defined.
Other examples:
---- a|b {a,b}
----- (a | b) (a | b) {aa , ab , ba , bb}
----- aa | ab | ba | bb {aa , ab , ba , bb}
---- a* The set of all strings formed by the letter a
---- (a|b)* All sets of strings consisting of a and b
* means repeated several times
formal definition
---- Naming the regular form to make the presentation brief
? ---Can have or not, + means optional.
ws, white space, several blank characters
transformation diagram
1.3 Finite Automata
Uncertain Finite Automata NFA
1.4 Deterministic Finite Automata DFA
Change from NFA to DFA
An algorithm is shown as follows:
example: