Java Regex: Capture Multiple Matches on the same Line

Omari Celestine :

I am trying to create a regex expression that would match one or multiple variable value assignments on the same line. I am using the following expression:

([a-z][a-zA-Z0-9-]*)=(('(\'|[^\'])*')|("(\"|[^"])*"))

For example, if I have the following string as input:

a="xyz" b="hello world"

And using the following code:

Matcher matcher = rules.get(regex).matcher(input);
int start = 0;

while (matcher.find(start)) {
    System.err.println(matcher.group(0));

    start = matcher.end();
}

It should give me two seperate results:

1. a="xyz"
2. b="hello world"

But it only returns one, the entire input string.

a="xyz" b="hello world"

It seems to be taking xyz" b="hello world as the inner part. How can I resolve this?

Wiktor Stribiżew :

You may use

(?s)([a-z][a-zA-Z0-9-]*)=(?:'([^\\']*(?:\\.[^\\']*)*)'|"([^"\\]*(?:\\.[^"\\]*)*)")

See the regex demo

In Java,

String regex = "(?s)([a-z][a-zA-Z0-9-]*)=(?:'([^\\\\']*(?:\\\\.[^\\\\']*)*)'|\"([^\"\\\\]*(?:\\\\.[^\"\\\\]*)*)\")";

Details

  • (?s) - inline Pattern.DOTALL embedded flag option that matches . match line break chars, too
  • ([a-z][a-zA-Z0-9-]*) - Group 1
  • = - an equals sign
  • (?:'([^\\']*(?:\\.[^\\']*)*)'|"([^"\\]*(?:\\.[^"\\]*)*)") - a non-capturing group matching one of the two alternatives:
    • '([^\\']*(?:\\.[^\\']*)*)' - ', then any amount of chars other than \ and ' followed with 0+ repetitions of any escape sequence followed with 0+ chars other than \ and '
    • | - or
    • "([^"\\]*(?:\\.[^"\\]*)*)" - ", then any amount of chars other than \ and " followed with 0+ repetitions of any escape sequence followed with 0+ chars other than \ and " .

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=25224&siteId=1