Compiler theory - -2.2 lexical analysis chapter a simple syntax compiler guidance

 lexical analysis:

  • Main features: lexical analyzer reads characters from the input, the lexical units and their constituent objects
  • The main steps:
    • Prefetch: using a variable peek to hold the next input currently read numbers or characters of the character, to predict to read the character, if the currently read character can be recognized, the peek as a blank character. When the lexical analyzer returns a lexical unit that holds the variable peek or morphological characters after the current lexical unit or preserve whitespace.
    • Removing white space and comments:
      •  Figure 2-29 pseudocode continuously read the input characters in the event of spaces, tabs or new lines to skip the blank portion. Variable peek stored for the next input character. Added to the line number in the error message and the context help locate errors. This code uses variable line number of line breaks statistical input.

    • Identifying and calculating constants:
      •  The pseudo code in Figure 2-30 digits read in an integer, and with the value of this variable v is integrated to obtain an integer.

    • Identify keywords and identifiers:
      •  FIG 2-31 pseudo code reserved words to find the get operation. The pseudo-code reading a string begin with a letter s, letters and number of bits from the input. We assume that s read as long as possible, that is, as long as the lexical analyzer encountered letters or digits, it will continue to read characters from the input. Not when it encounters letters or digits, such as that encountered whitespace, morpheme has been read is copied into the buffer b. If the string table already has an entry of s, it returns from the lexical unit words.get obtained. Here s may be a keyword in the table initialization of the words s already in the table; it can also be a before being added to the list of identifiers. S If the corresponding entry does not exist, then the lexical units by id attribute value s and the composition is added to the string table, and returned.

  • Lexical analyzer:
    • Main process:
      • Back lexical unit scan function object
    • Individual objects:
      • Deliverable Token:
        •  Token class has a tag field for parsing decision making
      • Num subclass and subclass Word:
        • Num subclass adds a field value, an integer value for storing

        •  Subclass Word adds a field lexem, save for morpheme keywords and identifiers

    • Lexer:

 

参考-《编译原理(第二版)》

Guess you like

Origin www.cnblogs.com/fangzhiyou/p/12416860.html