Why does this pattern fail to compile :
Pattern.compile("(?x)[ ]\\b");
Error
ERROR java.util.regex.PatternSyntaxException:
Illegal/unsupported escape sequence near index 8
(?x)[ ]\b
^
at java_util_regex_Pattern$compile.call (Unknown Source)
While the following equivalent ones work?
Pattern.compile("(?x)\\ \\b");
Pattern.compile("[ ]\\b");
Pattern.compile(" \\b");
Is this a bug in the Java regex compiler, or am I missing something? I like to use [ ]
in verbose regex instead of backslash-backslash-space because it saves some visual noise. But apparently they are not the same!
PS: this issue is not about backslashes. It's about escaping spaces in a verbose regex using a character class containing a single space [ ]
instead of using a backslash.
Somehow the combination of verbose regex (?x)
and a character class containing a single space [ ]
throws the compiler off and makes it not recognize the word boundary escape \b
Tested with Java up to 1.8.0_151
This is a bug in Java's peekPastWhitespace()
method in the Pattern
class. Tracing this entire issue down... I decided to take a look at OpenJDK 8-b132's Pattern
implementation. Let's start hammering this down from the top:
compile()
callsexpr()
on line 1696expr()
callssequence()
on line 1996sequence()
callsclazz()
on line 2063 since the case of[
was metclazz()
callspeek()
on line 2509peek()
callspeekPastWhitespace()
on line 1830 sinceif(has(COMMENTS))
evaluates totrue
(due to having added thex
flag(?x)
at the beginning of the pattern)peekPastWhitespace()
(posted below) skips all spaces in the pattern.
private int peekPastWhitespace(int ch) {
while (ASCII.isSpace(ch) || ch == '#') {
while (ASCII.isSpace(ch))
ch = temp[++cursor]
if (ch == '#') {
ch = peekPastLine();
}
}
return ch;
}
The same bug exists in the parsePastWhitespace()
method.
Your regex is being interpreted as []\\b
, which is the cause of your error because \b
is not supported in a character class in Java. Moreover, once you fix the \b
issue, your character class also doesn't have a closing ]
.
What you can do to fix this problem:
\\
As the OP mentioned, simply use double backslash and space
[\\ ]
Escape the space within the character class so that it gets interpreted literally[ ](?x)\\b
Place the inline modifier after the character class