Antlr4 ÅÄÖ chars infinity loop

Jonathan Andersson :

I have an Antlr4 grammar that ends up in a infinity loop when trying to parse an expression.

Running Antlr version 4.7 Java 1.8

The expression looks like this:

monkey=Å

But it works if the right variable is a string:

monkey="Å"

Or if it looks like this:

monkey=A

The last message Antlr prints before it gets stuck is:

line 1:5 mismatched input '' expecting {NUMBER, STRING, BOOLEAN, 'EMPTY', 'NULL'}

Sadly I'm not an expert at Antlr and I've tried to read up on it but can not figure this one out.

Here is my grammar file:

grammar MyObjectFilter;


/*
 * Lexer rules
*/

fragment DIGIT : [0-9] ;

NUMBER     : DIGIT+ ([.,] DIGIT+)?;
// Non-greedy String expression that also removes the quotes from the    string
STRING     : '"' ( '\\"' | . )*? '"'  {setText(getText().substring(1, getText().length()-1));} ; 
BOOLEAN    : 'true' | 'false';
EMPTY      : 'EMPTY';
NULL       : 'NULL';

// Remove the $ sign from the start of the identifier
IDENTIFIER : [a-zA-Z][a-zA-Z0-9._-]* ;
VALUE      : [0-9]*;

AND        : '&&' ;
OR         : '||' ;
NOT        : '!' ;
NEQ        : '!=' ;
GT         : '>' ;
GE         : '>=' ;
LT         : '<' ;
LE         : '<=' ;
EQ         : '=' ;
LPAREN     : '(' ;
RPAREN     : ')' ;

WS         : [ \r\t\u000C\n]+ -> skip;

/*
 * Parser rules
*/
parse
: expression EOF
;

expression
: LPAREN expression RPAREN                       #parenExpression
| NOT expression                                 #notExpression
| left=identifier op=comparator right=value      #comparatorExpression
| left=expression op=binary right=expression     #binaryExpression
;

identifier
: IDENTIFIER 
;

value
: STRING | NUMBER | BOOLEAN | EMPTY | NULL
;

comparator
: GT | GE | LT | LE | EQ | NEQ
;

binary
: AND | OR
;

Initializing this with:

InputStream stream = new ByteArrayInputStream(definition.getBytes(StandardCharsets.UTF_8));
MyObjectFilterLexer lexer = new MyObjectFilterLexer(CharStreams.fromStream(stream, StandardCharsets.UTF_8));
MyObjectFilterParser parser = new WTObjectFilterParser(new CommonTokenStream(lexer));

//This is where it get stuck.
ExpressionContext expr = parser.expression();

My best guess is that it can not determine the EOF of the expression.

Bart Kiers :

There's a lexer rule that matches zero-width tokens (of which there are an infinite amount):

VALUE      : [0-9]*;

The changing it into:

VALUE      : [0-9]+;

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=144354&siteId=1