[White] to create a compiler Series 3 simple formula calculator (addition and multiplication) (JAVA realize)

Our previous text [white] 1 series to create a compiler compiler front-end technology what is? Already we know the results grammatical analysis is to create a AST. So we have to deepen understanding of the generation AST process by implementing a simple formula calculator. This article focuses on: recursive descent algorithm and context-free grammar . We consider only explain addition and multiplication . (Is the same as subtraction and division on principle, not repeat it here discussed)


Principle dwell

Variable declaration statement

Let's take a look at the variable declaration statements, understand what is "down" .

I mentioned earlier, and for the "int age = 45" this declaration, we build AST shown below.

Rule statement declaring variables: the left part is a non-terminal (Non-Terminal) . It is the right of production (Production's Rule) . In the process of parsing, the left will be replaced on the right. If there is an alternative after nonterminal, then continue this alternative process, until finally all were terminator (Terminal), is the Token. Only terminator can become leaf nodes of the AST.

To declare an int variable, you need to have an Int type Token, plus a variable identifier, followed by an optional assignment expression behind. I want to match a variable of type int declaration statement, the pseudo-code below.

//伪代码
MatchIntDeclare(){
  MatchToken(Int);        //匹配Int关键字
  MatchIdentifier();       //匹配标识符
  MatchToken(equal);       //匹配等号
  MatchExpression();       //匹配表达式
}

 And implement with JAVA, specific code as follows. As used herein, the Token stream , to prefetch and read operations of the Token.


SimpleASTNode node = null;
Token token = tokens.peek();    //预读
if (token != null && token.getType() == TokenType.Int) {   //匹配Int
    token = tokens.read();      //消耗掉int
    if (tokens.peek().getType() == TokenType.Identifier) { //匹配标识符
        token = tokens.read();  //消耗掉标识符
        //创建当前节点,并把变量名记到AST节点的文本值中,
        //这里新建一个变量子节点也是可以的
        node = new SimpleASTNode(ASTNodeType.IntDeclaration, token.getText());
        token = tokens.peek();  //预读
        if (token != null && token.getType() == TokenType.Assignment) {
            tokens.read();      //消耗掉等号
            SimpleASTNode child = additive(tokens);  //匹配一个表达式
            if (child == null) {
                throw new Exception("invalide variable initialization, expecting an expression");
            }
            else{
                node.addChild(child);
            }
        }
    } else {
        throw new Exception("variable name expected");
    }
}

This whole match declaration process:

When parsing variable declaration statement, we first look at the Token is not int . If so, it would create an AST node, write down the back of the int variable name , and then look back is not with the initialization part, which is the equal sign plus an expression . We checked out if there is an equal sign, so, followed by a match expression.

The so-called "down" means: subordinate algorithm superior algorithm calls. AST generation performance, the superior algorithm to generate a higher node, subordinate algorithm generates lower nodes.

A little more intuitive, we are still before the visualization site to see, in the end of an assignment statement produces a kind of AST tree nodes.

For  "int = 45 Age"  , we generate the following node.

Arithmetic expression

In fact foregoing variable declaration statement grammar did not leave a regular grammar, use a regular grammar can be solved completely. But the next arithmetical expression can not directly use regular grammar, but you should use context-free grammars .

We know, for arithmetic expressions (consider only the addition and multiplication), we define the rule more trouble, because their combination is too much:

  • 4 + 6
  • 4 + 6 * 5
  • 6 * 5 + 4
  • 6 * 5
  • .......

At the same time, due to the different priorities arithmetic symbols, we can not directly use immobilized regular grammar resolved.

Resolve priority issues

We must first consider how to address the priority of arithmetic expressions, that is, "first multiplication and division, addition and subtraction after" issue. Prior to solve this problem, we must be clear AST is how calculation results.

AST calculation: division from the root node, depth-first traversal, and then gradually returns the value of the underlying calculations.

Below, the first computing node lowermost (deepest node): 3 × 5 = 15, then node 15 returns to the right adder calculates 15 + 2 = 17.

In this regard, we want to complete the priority, only the multiplication (division) node as an addition (subtraction) of a child node can ! Whereby when the depth-first traversal, first calculates the multiplication, and then back to the summing node to complete the addition.

We take a closer look at the following "nested" grammar. In addition grammar expressions, multiplication due to higher priority, so we can nest a multiplication expression grammar, while a plus expression can be seen as a plus expression plus a multiplication expression, this nested approach, we can go all the complex plus expression matched only by two nested rules . Can be understood as recursion, or algorithm learned friend can understand the middle of the process is dynamic planning.

for example:

  • For the 2 × 3 This expression: just call additiveExpression in multiplicativeExpression.
  • For 2 + 3 × 5 this complex expression, the rule is invoked additiveExpression Plus multiplicativeExpression.

As long as the split to complete this simple, no matter how long after the encounter multiplication addition arithmetic formula, can be split into two forms. This is a recursive charm.

additiveExpression
    :   multiplicativeExpression
    |   additiveExpression Plus multiplicativeExpression
    ;

multiplicativeExpression
    :   IntLiteral
    |   multiplicativeExpression Star IntLiteral
    ;

Visual representation as follows.

With this knowledge, when we parse arithmetic expressions, we could get  the addition rule to match  . In  addition rules will be nested match multiplication rule  . We nesting grammar, to achieve the priority calculation. It should be noted that the addition rule also recursively and references addition rule .

This grammar has no way to use a regular grammar, and it has a more universal than the regular grammar, more expressive, known as: context-free grammar  . Regular grammar is a subset of the context-free grammar. Their difference is context-free grammars allow recursive calls, and regular grammar is not allowed . Context-free means that, under any circumstances, grammar derivation rules are the same. For example, in a variable declaration statement you might want to use an arithmetic expression to do variable initialization, while in other places may also use arithmetic expressions. No matter where you are, grammar arithmetic expressions are the same, are allowed to use addition and multiplication, calculate the priority remains unchanged.

TIPS: context-free grammar is just like Russian dolls, like, need to discuss the situation of each layer are the same, a string of long, complex formulas split, ultimately consists of simple multiplication and addition. FIG lower expression is  A * = 2. 8. 9 + + + 2 *. 5 *. 1. 4  . A long list of additions multiplication arithmetic expressions, broken down final layer by layer, and finally to the leaf node is just a simple two yuan additions and two yuan multiplication to represent. (I'm trying to express a)

Left recursion to solve the infinite loop problem

OK, so far, we already know, allows the use of nested context-free grammar  recursively can represent any addition multiplication arithmetic formula. Our idea is quite right, that there is no problem to implement it? of course not!

Let's look at a simple addition of recursion: 2 + 3.

additiveExpression
    :   IntLiteral
    |   additiveExpression Plus IntLiteral
    ;

In accordance with the above grammar, let's analyze the matching process:

  • First look is not literal, it found "2 + 3" is an additive formula;
  • See if it is an addition equation and found that indeed, recursive;
    • And then see if it is literal, not found;
    • And then see if it is regarded as an addition, we found that there is, once again recursive
      • .....
      • .....

We found yet, we continue to recursion does not solve the problem that way, the addition equation has been a short-answer calls itself recursively. This situation is left recursive . Through the above analysis, we know the left is a recursive descent recursive algorithm can not handle, this is the biggest problem recursive descent algorithm .

How to solve it? How to put behind "additiveExpression" transposed to the plus sign? Let's give it a try.

That is, before the addition of the situation we are discussing is standing in front plus discussion, this endless recursion. If we follow the plus sign summing arithmetic type delimiter, the arithmetic formula into two, plus sign in front recursively, after the plus sign is also recursive, you can perfectly solve the problem. We continue to see examples of turn 2 + 3:

  • Plus sign in front: Call multiplicativeExpression
    • Found IntLiteral: 2
  • After the plus sign: Call multiplicativeExpression
    • Found IntLiteral: 3

A good solution to the problem of infinite loop.

additiveExpression
    :   multiplicativeExpression
    |   multiplicativeExpression Plus additiveExpression
    ;
multiplicativeExpression
    :   IntLiteral
    |   multiplicativeExpression Star IntLiteral
    ;

Let's try multiplication expression can match, if not, then this is certainly not the summing node node because the two productions plus expression must match the first multiplication expression. In such cases, return null on it, the caller did not succeed this match. If a multiplication expression match is successful, then try to match the right part of the plus sign, that is to match the plus expression recursively. If the match is successful, construct an addition of ASTNode return. (Original Source: United States of compiler theory)

Expression evaluation

Here is not to go into the details, a depth-first traversal complete AST, the value of the root node is the value of the expression is calculated.

The key code implementation (see complete code end of the text)

Addition Multiply

    /**
     * 语法解析:加法表达式
     * @return
     * @throws Exception
     */
    private SimpleASTNode additive(TokenReader tokens) throws Exception {
        //先加号前面匹配乘法
        SimpleASTNode child1 = multiplicative(tokens);
        SimpleASTNode node = child1;

        Token token = tokens.peek();
        if (child1 != null && token != null) {
            if (token.getType() == TokenType.Plus || token.getType() == TokenType.Minus) {
                token = tokens.read();
                SimpleASTNode child2 = additive(tokens);
                if (child2 != null) {
                    node = new SimpleASTNode(ASTNodeType.Additive, token.getText());
                    node.addChild(child1);
                    node.addChild(child2);
                } else {
                    throw new Exception("【乘法表达式错误】:需要补充加号右边部分");
                }
            }
        }
        return node;
    }

    /**
     * 语法解析:乘法表达式
     * @return
     * @throws Exception
     */
    private SimpleASTNode multiplicative(TokenReader tokens) throws Exception {
        SimpleASTNode child1 = primary(tokens);
        SimpleASTNode node = child1;

        Token token = tokens.peek();
        if (child1 != null && token != null) {
            if (token.getType() == TokenType.Star || token.getType() == TokenType.Slash) {
                token = tokens.read();
                SimpleASTNode child2 = primary(tokens);
                if (child2 != null) {
                    node = new SimpleASTNode(ASTNodeType.Multiplicative, token.getText());
                    node.addChild(child1);
                    node.addChild(child2);
                } else {
                    throw new Exception("【加法表达式错误】:需要补充乘号右边部分");
                }
            }
        }
        return node;
    }

AST build process (recursive)

    /**
     * 语法解析:根节点
     * @return
     * @throws Exception
     */
    private SimpleASTNode prog(TokenReader tokens) throws Exception {
        //构建根节点
        SimpleASTNode node = new SimpleASTNode(ASTNodeType.Programm, "Calculator");

        //构建子节点(递归完成)
        SimpleASTNode child = additive(tokens);

        if (child != null) {
            node.addChild(child);
        }
        return node;
    }

Computing node value (recursively)

    /**
     * 对某个AST节点求值,并打印求值过程。
     * @param node
     * @param indent  打印输出时的缩进量,用tab控制
     * @return
     */
    private int evaluate(ASTNode node, String indent) {
        int result = 0;
        System.out.println(indent + "Calculating: " + node.getType());
        switch (node.getType()) {
            case Programm:
                for (ASTNode child : node.getChildren()) {
                    result = evaluate(child, indent + "\t");
                }
                break;
            case Additive:
                ASTNode child1 = node.getChildren().get(0);
                int value1 = evaluate(child1, indent + "\t");
                ASTNode child2 = node.getChildren().get(1);
                int value2 = evaluate(child2, indent + "\t");
                if (node.getText().equals("+")) {
                    result = value1 + value2;
                } else {
                    result = value1 - value2;
                }
                break;
            case Multiplicative:
                child1 = node.getChildren().get(0);
                value1 = evaluate(child1, indent + "\t");
                child2 = node.getChildren().get(1);
                value2 = evaluate(child2, indent + "\t");
                if (node.getText().equals("*")) {
                    result = value1 * value2;
                } else {
                    result = value1 / value2;
                }
                break;
            case IntLiteral:
                result = Integer.valueOf(node.getText()).intValue();
                break;
            default:
        }
        System.out.println(indent + "Result: " + result);
        return result;
    }

 Export

to sum up

  • Recursive descent algorithm in the "down" and "recursive" two characteristics. It is with the grammar rules are substantially homogeneous by grammar necessarily write algorithm.
  • Left recursion can cause a recursive into the loop, and therefore need to be divided from before and after the addition of recursive symbol.
  • Context-free grammar is more universal than the regular grammar, except that it may be recursively nested, the latter can not.

 Complete code: https://github.com/SongJain/TheBeautyOfCompiling/tree/master/SimpleCalculator


Personal original study notes, refer to the course "Compiler Principle beauty."

Published 62 original articles · won praise 34 · views 20000 +

Guess you like

Origin blog.csdn.net/weixin_41960890/article/details/105102506