Regex to remove all whitespace except around keywords and between quotes

vinh hoang :

I want to:

  1. remove all whitespaces unless it's right before or after (0-1 space before and 0-1 after) the predefined keywords (for example: and, or, if then we leave the spaces in " and " or " and" or "and " unchanged)

  2. ignore everything between quotes

I've tried many patterns. The closest I've come up with is pretty close, but it still removes the space after keywords, which I'm trying to avoid.

regex:

\s(?!and|or|if)(?=(?:[^"]*"[^"]*")*[^"]*$)

Test String:

            if    (ans(this) >= ans({1,2})  and (cond({3,4})  or ans(this) <= ans({5,6})), 7, 8)  and {111} > {222}  or ans(this) = "hello    my friend and  or  " and(cond({1,2}) $1 123     

Ideal result:

 if (ans(this)>=ans({1,2}) and (cond({3,4}) or ans(this)<=ans({5,6})),7,8) and {111}>{222} or ans(this)="hello    my friend and  or  " and(cond({1,2})$1123

I then can use str = str.replaceAll in java to remove those whitespaces. I don't mind doing multiple steps to get to the result, but I am not familiar with regex so kinda stuck.

any help would be appreciated!

Note: I edited the result. Sorry about that. For the space around keywords: shrunk to 1 if there are spaces. Either leave it or add 1 space if it's 0 (I just don't want "or ans" becomes "orans", but "and(cond" becomes "and (cond)" is fine (shrink to 1 space before and 1 space after if exists). Ignore everything between quotes.

Wiktor Stribiżew :

You may use

String example = "            if    (ans(this) >= ans({1,2})  and (cond({3,4})  or ans(this) <= ans({5,6})), 7, 8)  and {111} > {222}  or ans(this) = \"hello    my friend and  or  \" and(cond({1,2}) $1 123    ";
String rx = "\\s*\\b(and|or|if)\\b\\s*|(\"[^\"]*\")|(\\s+)";
Matcher m = Pattern.compile(rx).matcher(example);
example = m.replaceAll(r -> r.group(3) != null ? "" : r.group(2) != null ? r.group(2) : " " + r.group(1) + " ").trim();
System.out.println( example );

See the Java demo.

The pattern matches

  • \s*\b(and|or|if)\b\s* - 0+ whitespaces, word boundary, Group 1: and, or, if, word boundary and then 0+ whitespaces
  • | - or
  • (\"[^\"]*\") - Group 2: ", any 0+ chars other than " and then a "
  • | - or
  • (\s+) - Group 3: 1+ whitespaces.

If Group 3 matches, they are removed, if Group 2 matches, it is put back into the result and if Group 1 matches, it is wrapped with spaces and pasted back. The whole result is .trim()ed.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=305645&siteId=1