Write a compiler in Java (1)-lexical and grammatical analysis

ANTLR installation
$ cd /usr/local/lib
$ wget https://www.antlr.org/download/antlr-4.7.1-complete.jar
$ export CLASSPATH=".:/usr/local/lib/antlr-4.7 .1-complete.jar:$CLASSPATH"
$ alias antlr4='java -jar /usr/local/lib/antlr-4.7.1-complete.jar'
$ alias grun='java org.antlr.v4.gui.TestRig '
lexical analysis and lexical definitions Antlr
lexical analysis is to speak the word character sequence into a sequence of processes. A word is the smallest unit that constitutes the source code, and it consists of one or more consecutive characters. For example, if we perform lexical analysis on the code int year=2018, the source code will be converted into a sequence of 5 words: int, \s (space), year, = and 2018.

The lexical definition of antlr is relatively simple, and most of the lexical methods can be expressed using simple regular expressions. Let's look at a simple example DemoLexer.g4:

Grammer Demo the lexer;
PLUS: '+';
MINUS: '-';
the MULTIPLE: '*';
the DIV: '/';
LPAREN: '(';
RPAREN: ')';
NUMBER: [0-9] +;
first Line 1 declares that this is a lexical definition file. Lines 2-7 define 4 operator keywords and two left and right parentheses. Line 8 is the definition of a positive integer.

Grammar analysis and antlr grammar definition
Grammar analysis is the process of analyzing input composed of word sequences and determining its grammatical structure according to a specific formal grammar. For example, the 5 word sequences of int, \s (space), year, = and 2018 are input to the parser in order, and the parser can recognize that this is a statement of variable declaration and initialization.

The grammatical definition of antlr is similar to the lexical definition, listing all possible word combinations. Let us take a simple four arithmetic grammar definition as an example.

DemoParser.g4

parser grammar DemoParser;
options {tokenVocab=DemoLexer;}
expr:
'(' expr')'
| NUMBER (MULTIPLE | DIV) NUMBER
| NUMBER (PLUS | MINUS) NUMBER
; The
first line declares that this is a grammar definition file, the second Set the lexical option in the line to indicate which lexical to use. Lines 3-7 are the grammatical definition of the operation expression. Here is a brief introduction, expr is the name we gave to the grammar. This grammar has three cases: expressions wrapped in parentheses, multiplication and division expressions, and addition and subtraction expressions. It should be noted that the order of definition of multiplication and division expressions and addition and subtraction expressions cannot be exchanged arbitrarily, otherwise it will affect the precedence of operators and result in an incorrect parse tree. High priority expressions need to be defined first.

Generate lexical analyzer and grammar analyzer. The
above is only the definition of antlr lexical and grammar, which cannot be used directly in java, because it is not a legal java source code, we need to convert it into java code, this conversion work is still done by antlr for us carry out. Run the following command in the terminal:

Antlr4 DemoLexer.g4 -package demo.antlr
antlr4 DemoParser.g4 -package demo.antlr -visitor After the
command is executed, the following files will be automatically generated:

DemoLexer.java
DemoLexer.tokens
DemoParser.java
DemoParser.tokens
DemoParserBaseListener.java
DemoParserBaseVisitor.java
DemoParserListener.java
DemoParserVisitor.java
These java files are the lexical analyzer and syntax analyzer that we can directly call in the program.

Use of lexical analyzer and grammatical analyzer After the
generated lexical analyzer and grammatical analyzer are copied to the source directory, they can be used directly. Let's take a look at the most commonly used calling methods:

String source = "1+2+3";//The expression we need to parse
CharStream charStream = new ANTLRInputStream(source);
DemoLexer lexer = new DemoLexer(charStrem);//Lexical analyzer
CommonTokenStream tokenStream = new CommonTokenStream(lexer) ;
DemoParser parser = new DemoParser(tokenStream);//Parser
ExprContext exprContext = parser.expr();//Start parsing with the expr rule as the starting rule, and get the grammar parser.
Through the above steps, the expression string is finally Converted into a parse tree, with a parse tree, it becomes very simple to evaluate expressions. First, we first define a parse tree visit (traversal) device MainDemoParserVisitor.java:

public MainDemoParserVisitor extends DemoParserBaseVisitor {
@override
public Integer visitExpr(ExprContext ctx) {
if (ctx.MULTIPLE()!=null) {//乘法
return Integer.parseInt(ctx.NUMBER(0)) * Integer.parseInt(ctx.NUMBER(1));
} else if (ctx.DIV()!=null) {//除法
return Integer.parseInt(ctx.NUMBER(0)) / Integer.parseInt(ctx.NUMBER(1));
} else if (ctx.PLUS()!=null) {//加法
return Integer.parseInt(ctx.NUMBER(0)) + Integer.parseInt(ctx.NUMBER(1));
} else if (ctx.MINUS()!=null) {//减法
return Integer.parseInt(ctx.NUMBER(0)) - Integer.parseInt(ctx.NUMBER(1));
} else {//对括号内表达式求值
return visitExpr(ctx.expr());
}
}

}
Use the accessor to traverse and evaluate the parse tree:

MainDemoParserVisitor visitor = new MainDemoParserVisitor();//Create a visitor
Integer result = (Integer) vistor.visit(exprContext);//Start traversing the syntax parse tree to evaluate the expression
System.out.println(result);
Amazon evaluation www .yisuping.com

Guess you like

Origin blog.csdn.net/weixin_45032957/article/details/108399280