Error compiling a verbose Java regex with character class and word boundary

Tobia :

Why does this pattern fail to compile :

Pattern.compile("(?x)[ ]\\b");

Error

ERROR java.util.regex.PatternSyntaxException:
Illegal/unsupported escape sequence near index 8
(?x)[ ]\b
        ^
at java_util_regex_Pattern$compile.call (Unknown Source)

While the following equivalent ones work?

Pattern.compile("(?x)\\ \\b");
Pattern.compile("[ ]\\b");
Pattern.compile(" \\b");

Is this a bug in the Java regex compiler, or am I missing something? I like to use [ ] in verbose regex instead of backslash-backslash-space because it saves some visual noise. But apparently they are not the same!

PS: this issue is not about backslashes. It's about escaping spaces in a verbose regex using a character class containing a single space [ ] instead of using a backslash.

Somehow the combination of verbose regex (?x) and a character class containing a single space [ ] throws the compiler off and makes it not recognize the word boundary escape \b


Tested with Java up to 1.8.0_151

ctwheels :

This is a bug in Java's peekPastWhitespace() method in the Pattern class. Tracing this entire issue down... I decided to take a look at OpenJDK 8-b132's Pattern implementation. Let's start hammering this down from the top:

  1. compile() calls expr() on line 1696
  2. expr() calls sequence() on line 1996
  3. sequence() calls clazz() on line 2063 since the case of [ was met
  4. clazz() calls peek() on line 2509
  5. peek() calls peekPastWhitespace() on line 1830 since if(has(COMMENTS)) evaluates to true (due to having added the x flag (?x) at the beginning of the pattern)
  6. peekPastWhitespace() (posted below) skips all spaces in the pattern.

peekPastWhitespace()

private int peekPastWhitespace(int ch) {
    while (ASCII.isSpace(ch) || ch == '#') {
        while (ASCII.isSpace(ch))
            ch = temp[++cursor]
        if (ch == '#') {
            ch = peekPastLine();
        }
    }
    return ch;
}

The same bug exists in the parsePastWhitespace() method.

Your regex is being interpreted as []\\b, which is the cause of your error because \b is not supported in a character class in Java. Moreover, once you fix the \b issue, your character class also doesn't have a closing ].

What you can do to fix this problem:

  1. \\ As the OP mentioned, simply use double backslash and space
  2. [\\ ] Escape the space within the character class so that it gets interpreted literally
  3. [ ](?x)\\b Place the inline modifier after the character class

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=427323&siteId=1