Lexical, syntax and semantic analysis program design and implementation, including error prompts and error recovery

Lexical description

(1) Keywords
main, int, char, if, else, for, while, void

(2)Operator
= + - * / < <= > >= == !=

(3)Boundary character
; ( ) { }

(4)Identifier
ID = letter(letter|digit)*

(5)Integer constant
NUM = digit digit*

(6) Spaces
' ' '\n' '\r' '\t'
spaces are used to separate ID, NUM, operators, delimiters and keywords

Context-free grammar description

<program> ::= main()<statement block>
<statement block> ::= '{'<statement string>'}'
<statement string> ::= <statement>{;<statement>};
<statement> ::= <assignment statement> | <conditional statement> | <loop statement> <
assignment statement> ::= ID=<expression>
<conditional statement> ::= if(<condition>) <statement block>
<loop statement > ::= while(<condition>) <statement block>
<condition> ::= <expression><relational operator><expression> <factor>
::= ID | NUM | (<expression>)
< term> ::= <factor>{*<factor> | /<factor>}
<expression> ::= <term>{+<term> | -<term>}
<relational operator> ::= < | <= | > | >= | == | !=

Word category coding scheme

Insert image description here

The main algorithmic ideas of lexical analysis programs

  1. Set word category code
  2. Writing lexical rules in regular form
  3. Construct a state transition diagram for recognizing language words based on the formal formula
  4. Let each state in the state transition diagram correspond to a small program, and spell out the corresponding word symbol based on the first word type of the word symbol scanned
    Insert image description here
    Insert image description here

Detailed description of the algorithmic idea of ​​grammatical analysis method

recursive descent parsing

  1. Writing a context-free grammar according to the rules of a programming language
  2. Construct an analysis function for each non-terminal symbol, guide the selection of the function according to the leading symbol, and identify the grammatical component represented by the non-terminal symbol. When there are multiple rules for the non-terminal symbol, a candidate rule can be uniquely selected according to the conditions of the LL(1) grammar.
    ①When encountering terminal symbol a: if (current input symbol == a) reads the next input symbol
    ②When encountering non-terminal symbol A: call A();
    ③Encountering rule A → ε: if (current input symbol ϵFOLLOW(A)){ } else error()

Detailed description of the algorithmic idea of ​​semantic analysis method

Recursive descent grammar-guided translation

Embed semantic subroutines into each recursive process, and transfer semantic information through local quantities and parameters inside the recursive subroutines.

  1. Each A constructs a function, and the return value of the function is the comprehensive attribute of A; each attribute of each symbol X that appears in the production of A sets a local variable in the function;
  2. During the function process of A, which candidate formula to use is determined based on the current input symbol;
  3. The program code corresponding to each production is performed on b, B, and {} in order from left to right:
    (1) For the terminal symbol b with the comprehensive attribute x, store the value of x in the corresponding variable; generate A call matching b continues reading the next input symbol.
    (2) For each B, generate an assignment statement c:=B() with a function call on the right, where c is the variable corresponding to the comprehensive attribute of B.
    (3) For {}, copy the action code into the analyzer and replace each reference to the attribute with a variable representing the attribute.

Assignment statement translation

Add an assignment statement semantic analysis program to the expression analysis function and term analysis function to generate a quaternion for each subexpression and term. The next symbol read in the expression analysis function is still '+' or '- ' will generate a temporary variable to store the calculation result and continue to generate the quaternion. If the symbol is '*' or '-' is read in the term analysis function, the quaternion will continue to be generated. The algorithm is the same as the expression analysis function.

if statement translation

Add if statement translation processing to the conditional statement analysis function, and generate two quaternions for the if statement, which are jump quaternions when the condition is true and jump quaternions when the condition is not true. Here, attention should be paid to recording through ntc, nfc, and nNXQ variables. For the generative position of the true chain, false chain and next statement, first backfill nNXQ into the quaternion pointed by ntc through the bp() function, and then backfill the end position of the if statement to nfc at the end of the syntax analysis of the if statement. Refers to the four-yuan formula.

while statement translation

The while statement translation is similar to the if statement, but it should be noted that the processing after the syntax analysis of the while statement is different from that of if, because the while statement will continue to execute if the condition continues to be true, so an extra jump quaternion jump back must be generated at the end. The condition of while determines the quaternion.

Guess you like

Origin blog.csdn.net/weixin_45177370/article/details/135232908