Regular Expression for separating strings enclosed in parentheses

Eqr444 :

I have a String that contains 2 or 3 company names each enclosed in parentheses. Each company name can also contains words in parentheses. I need to separate them using regular expressions but didn't find how.

My inputStr:

(Motor (Sport) (racing) Ltd.) (Motorsport racing (Ltd.)) (Motorsport racing Ltd.)
or 
(Motor (Sport) (racing) Ltd.) (Motorsport racing (Ltd.))

The expected result is:

str1 = Motor (Sport) (racing) Ltd.
str2 = Motorsport racing (Ltd.)
str3 = Motorsport racing Ltd.

My code:

String str1, str2, str3;
Pattern p = Pattern.compile("\\((.*?)\\)");
Matcher m = p.matcher(inputStr);
int index = 0;
while(m.find()) {

    String text = m.group(1);
    text = text != null && StringUtils.countMatches(text, "(") != StringUtils.countMatches(text, ")") ? text + ")" : text;

    if (index == 0) {
        str1= text;
    } else if (index == 1) {
        str2 = text;
    } else if (index == 2) {
        str3 = text;
    }

    index++;
}

This works great for str2 and str3 but not for str1.

Current result:

str1 = Motor (Sport)
str2 = Motorsport racing (Ltd.)
str3 = Motorsport racing Ltd.
Tamas Rev :

So we can assume that the parentheses can nest at most two levels deep. So we can do it without too much magic. I would go with this code:

List<String> matches = new ArrayList<>();
Pattern p = Pattern.compile("\\([^()]*(?:\\([^()]*\\)[^()]*)*\\)");
Matcher m = p.matcher(inputStr);
while (m.find()) {
    String fullMatch = m.group();
    matches.add(fullMatch.substring(1, fullMatch.length() - 1));
}

Explanation:

  • First we match a parenthesis: \\(
  • Then we match some non-parenthesis characters: [^()]*
  • Then zero or more times: (?:...)* we will see some stuff within parentheses, and then some non-parentheses again:
  • \\([^()]*\\)[^()]* - it's important that we don't allow any more parentheses within the inside parentheses
  • And then the closing parenthesis comes: \\)
  • m.group(); returns the actual full match.
  • fullMatch.substring(1, fullMatch.length() - 1) removes the parentheses from the start and the end. You could do it with another group too. I just didn't want to make the regex uglier.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=437617&siteId=1