White [5] to create a series of compilers implement a simple scripting language

We can continue to add code based on the previous calculator on the realization of a simple scripting language.


Added functionality

  • Support for variable declaration and initialization statement. "Int age;", "int age = 45;", "int age = 4 + 5;"
  • Support assignments. "Age = 45;"
  • You can use variables in expressions. "Age + 10 * 2;"
  • Achieve a complete command-line terminal.

Perfect grammar rules

Declaration

Int to begin with, followed by an identifier and an optional initialization part, that is, an equal sign and an expression, coupled with last semicolon:

intDeclaration : 'int' Identifier ( '=' additiveExpression)? ';';

Expression statement

Currently only supports the addition expression, in the future you can add other expressions, such as conditional expression, it is also behind a semicolon:

expressionStatement : additiveExpression ';';

Assignment

Identifier followed by an equal sign and an expression, coupled with a semicolon:

assignmentStatement : Identifier '=' additiveExpression ';';

Identifiers and brackets

In order to use the variables in the expression, we also need to rewrite primaryExpression, in addition to containing an integer literal outside, but also contains an identifier and a bracketed expression :

primaryExpression : Identifier| IntLiteral | '(' additiveExpression ')';

Let scripting language supports variable

To get support for scripting language variables that need a certain amount of storage space, which is able to complete the form below function.

int age = 45;
age + 10 * 2;

To assign values to variables, we must create a script language interpreter at the store , records different variables and their values:

private HashMap<String, Integer> variables = new HashMap<String, Integer>();

We simply use a HashMap as a  variable storage area . In the variable declaration statements and assignments, and all the variables that can modify the data storage area.


Assignment realization

"age = age + 2 * 10;"

To match an assignment, you should first look at the first Token is not an identifier. If not, then it returns null, the match fails.

If the first identifier Token indeed, we put it consumed, then look back is not followed by an equal sign.

If it is not equal sign, we prove that this is not an assignment statement, an expression of what might be. Then we will fallback just consumed Token, as if nothing had happened, and return null. Fallback method when the call is unread ().

If indeed equal sign followed behind, then continue to look back is not an expression, followed behind the expression is not a semicolon. If not, an error just fine. This completes the parsing of an assignment statement.

private SimpleASTNode assignmentStatement(TokenReader tokens) throws Exception {
    SimpleASTNode node = null;
    Token token = tokens.peek();    //预读,看看下面是不是标识符
    if (token != null && token.getType() == TokenType.Identifier) {
        token = tokens.read();      //读入标识符
        node = new SimpleASTNode(ASTNodeType.AssignmentStmt, token.getText());
        token = tokens.peek();      //预读,看看下面是不是等号
        if (token != null && token.getType() == TokenType.Assignment) {
            tokens.read();          //取出等号
            SimpleASTNode child = additive(tokens);
            if (child == null) {    //出错,等号右面没有一个合法的表达式
                throw new Exception("invalide assignment statement, expecting an expression");
            }
            else{
                node.addChild(child);   //添加子节点
                token = tokens.peek();  //预读,看看后面是不是分号
                if (token != null && token.getType() == TokenType.SemiColon) {
                    tokens.read();      //消耗掉这个分号

                } else {            //报错,缺少分号
                    throw new Exception("invalid statement, expecting semicolon");
                }
            }
        }
        else {
            tokens.unread();    //回溯,吐出之前消化掉的标识符
            node = null;
        }
    }
    return node;
}

 Similarly, we can, for variable declaration statement is rewritten in a similar logic.


Failed need backtracking

When matched against statements may appear to match half of the match failed to identify problems. So this time the need for the next match syntax, so for our Token stream, should go back to the initial state . Because we do not know how many steps a syntax before the match, so the best way is to restore it directly, go try another rule.

Tentative  and  backtracking  process, is a typical feature of recursive descent algorithm. Recursive descent algorithm is simple, but it is through trial and backtracking, but always can match up the correct syntax, which is its powerful place. The disadvantage is a little backtracking would lower the efficiency. But we can improve and optimize on this basis, to achieve with predictive analysis recursive descent , as well as non-recursive predictive analytics .

And back error

We need to know when to be given, when you should go back.

We prompt  syntax error  when it is said that we know have no other possible match option, do not need to waste time backtracking. Therefore, when a syntax error, the error sooner the better. Advance report syntax errors, an optimization algorithm is we write.

When writing compiler, we not only want to be able to resolve the correct syntax, but also provide a friendly prompt as possible for grammatical errors, to help users quickly locate the error.


REPL interactive interface

Scripting language usually provides a command line window that lets you enter one by one statement, immediately interpreted it and get the output, such as Node.js, Python and so provide such an interface. This input, execution, print cycle is called the REPL (the Read-Eval-Print Loop) .

We also implemented a simple REPL. Basically reads the code from the terminal line by line, when faced with a semicolon, it is interpreted:

SimpleParser parser = new SimpleParser();
SimpleScript script = new SimpleScript();
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));   //从终端获取输入

String scriptText = "";
System.out.print("\n>");   //提示符

while (true) {             //无限循环
    try {
        String line = reader.readLine().trim(); //读入一行
        if (line.equals("exit();")) {   //硬编码退出条件
            System.out.println("good bye!");
            break;
        }
        scriptText += line + "\n";
        if (line.endsWith(";")) { //如果没有遇到分号的话,会再读一行
            ASTNode tree = parser.parse(scriptText); //语法解析
            if (verbose) {
                parser.dumpAST(tree, "");
            }
          
            script.evaluate(tree, ""); //对AST求值,并打印

            System.out.print("\n>");   //显示一个提示符

            scriptText = "";
        }

    } catch (Exception e) { //如果发现语法错误,报错,然后可以继续执行
        System.out.println(e.getLocalizedMessage());
        System.out.print("\n>");   //提示符
        scriptText = "";
    } 
}

If the statement is correct, the system will immediately feed back the results. If the statement is wrong, REPL also fed back to the error message, and will continue to deal with the following statement.

My compiler!

Variable declaration, initialization and use identifiers in expressions

 Grammar errors


Complete code: https://github.com/SongJain/TheBeautyOfCompiling/tree/master/MySimpleParser/src

Published 62 original articles · won praise 34 · views 20000 +

Guess you like

Origin blog.csdn.net/weixin_41960890/article/details/105191488
Recommended