regex not removing underscore from pattern

Hari Bisht :

I was trying to code for allowing certain special characters in a string by using java.util.regex.Matcher and java.util.regex.pattern but this is not removing underscore from the same. I'm new here. I need help on this. Code extract below:

  // String to be scanned to find the pattern.
  String line = "This order was _:$ placed for QT3000! OK?";
  String pattern = "[^\\w\\s\\-?:().,'+\\/]";
  String s = null;

  // Create a Pattern object
  Pattern r = Pattern.compile(pattern);

  // Now create matcher object.
  Matcher m = r.matcher(line);
  s= m.replaceAll("");
  System.out.println("Output: " + s);

Expected: This order was : placed for QT3000 OK? Actual : This order was _: placed for QT3000 OK?

Wiktor Stribiżew :

The \w pattern matches underscores and [^\w] matches any char but letters, digits and an underscore.

Replace with \p{Alnum}:

String pattern = "[^\\p{Alnum}\\s?:().,'+/-]";

Note I put the hyphen at the end of the character class so as not to escape it and remove the escaping \ from the / as it is not a special regex metacharacter.

See the Java regex demo.

The [^\\p{Alnum}\\s?:().,'+/-] pattern matches any char but:

  • \p{Alnum} - alphanumeric [a-zA-Z0-9]
  • \s - whitespaces
  • ? - a question mark
  • : - a colon
  • ( - a ( symbol
  • ) - a ) symbol
  • . - a dot
  • , - a comma
  • ' - a single quotation mark
  • + - a plus
  • / - a forward slash
  • - - a hyphen.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=88795&siteId=1