Review Notes on Compilation Principles of University of Electronic Science and Technology of China (5): Lexical Analysis

Table of contents

foreword

Highlights

Lexical Analysis Overview

The function of lexical analysis

The output form of the lexical analyzer

The structure of the lexical analyzer

state transition diagram

Construction of State Transition Diagrams

Lexical Analyzer Design

basic structure

content

Symbol table

Purpose

composition

role in lexical analysis

The general form of a symbol table

Commonly used symbol table structure

Summary and Supplement 

Why Separate Lexical and Syntax Analysis

chapter summary


foreword

This review note is based on Mr. Zhang's classroom PPT, for my final review and reference for my classmates.

Start to enter the most important part, the following is the outline framework of the remaining knowledge


Highlights


Lexical Analysis Overview

The function of lexical analysis

Scan the character string of the source program, recognize the word symbol as output according to the lexical rules, and output the relevant error information (you can assign a line number to the error information) for the error found in the recognition process.

The relationship between lexical analyzers and syntax analyzers

① The lexical analyzer can be used as a separate link

② The lexical analyzer can be used as a subroutine of the grammatical analyzer

The output form of the lexical analyzer

kind of word

word output form

Binary             

Classification of words: basic words (reserved words) are coded for each word; identifiers (alphanumeric strings beginning with letters) are listed as a single type; constants are classified by type (integer, real, Boolean, character...)


The structure of the lexical analyzer

  • Input buffer: store source program 
  • Preprocessing procedures: cancel comments, propose useless blanks, tabulation, line feed, carriage return, etc.
  • scan buffer (what the lexical analysis is really going to use): input a fixed-length string from the input buffer to another
  • Buffer (scanning buffer), the lexical analyzer can directly perform symbol recognition in this buffer

Lexical analysis technology - advanced search : In order to determine the category of a word symbol, one or more units must be scanned 


state transition diagram

Definition: a finite directed graph, circles represent nodes, represent states, and directed edges connect nodes, and marked characters on it represent characters that may be accepted or recognized in this state, with a unique initial state and several final states.

The status with * indicates that if the last recognized character is not in the word list, a character needs to be returned

Recognize word symbols with a state transition diagram:

1) Start from the initial state;
2) Read a character from the input string;
3) Identify the character read and which one starts from the current state
match the token on the arc, go to the corresponding matching
The state pointed to by the arc;
4) Repeat 3), and fail when none match; a word symbol is recognized when the final state is reached.
  • How to distinguish basic words/reserved words that conform to identifiers?
  • Reserve reserved words in the symbol table and indicate that they are not identifiers. Create separate state transition diagrams for reserved words 

Construction of State Transition Diagrams


Lexical Analyzer Design

basic structure

content

  • word
  • Word list
  • state transition diagram 
  • matching algorithm

Symbol table

Purpose

In the program, the user defines many names with identifiers to represent different data objects, and the compiler can save these names in the symbol table .

composition

In addition to recording the name itself , the symbol table also records various attribute information associated with the name .

role in lexical analysis

  • Create symbol table, check and fill symbol table
  • fills the symbol table with properties of unique identifiers, numeric constants, and character constants
  • Write the entry address of the variable/constant in the symbol table to its own word (token) 

The general form of a symbol table

Each name corresponds to an entry, and an entry includes a name field and an information field

The information field has several subfields and flags, and the content is related to the name

Commonly used symbol table structure

linear table

Use N arrays to store N subfields of the symbol table

HASH table/hash table

 

Summary and Supplement 

 

Why Separate Lexical and Syntax Analysis

  • Simplify the design of the compiler
  • Improve compiler efficiency
  • Enhanced compiler portability

chapter summary

 

 

Guess you like

Origin blog.csdn.net/m0_59180666/article/details/130907725