\s doesn't actually capture all whitespace characters

Jack Cole :

In my Java 8 app, I am scanning for whitespaces in text passed in. But \s in my Regular Expression doesn't capture all whitespaces. The one whitespace that I've found that it doesn't capture so far in my testing is Non-breaking Space (Unicode 00A0). This was my regular expression that was running into that issue:

Pattern p = Pattern.compile("\\s");

To solve this, I added \h to my Regular Expression:

Pattern p = Pattern.compile("[\\s\\h]");

Now, are there any other whitespaces that I need to be aware of that wont be captured by \s\h?

SDJ :

According to the Pattern class documentation the characters that match \s are \t\n\x0B\f\r.

However, Unicode indeed supports a whole lot more space characters. Examples include:

  • \u2002: En space
  • \u2003: Em space
  • \u2003: Thin space
  • \u202F: Narrow no-break space

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=121518&siteId=1