Regex: how to substitute a string with n occurrences of a substring

Andrea :

As a premise, I have an HTML text, with some <ol> elements. These have a start attribute, but the framework I'm using is not capable to interpret them during a PDF conversion. So, the trick I am trying to apply is to add a number of invisible <li> elements at the beginning.

As an example, suppose this input text:

<ol start="3">
   <li>Element 1</li>
   <li>Element 2</li>
   <li>Element 3</li>
</ol>

I want to produce this result:

<ol>
   <li style="visibility:hidden"></li>
   <li style="visibility:hidden"></li>
   <li>Element 1</li>
   <li>Element 2</li>
   <li>Element 3</li>
</ol>

So, adding n-1 invisible elements into the ordered list. But I'm not able to do that from Java in a generalized way.

Supposing the exact case in the example, I could do this (using replace, so - to be honest - without regex):

htmlString = htmlString.replace("<ol start=\"3\">",
            "<ol><li style=\"visibility:hidden\"></li><li style=\"visibility:hidden\"></li>");

But, obviously, it just applies to the case with "start=3". I know that I can use groups to extract the "3", but how can I use it as a "variable" to specify the string <li style=\"visibility:hidden\"></li> n-1 number of times? Thanks for any insight.

tobias_k :

Since Java 9, there's a Matcher.replaceAll method taking a callback function as a parameter:

String text = "<ol start=\"3\">\n\t<li>Element 1</li>\n\t<li>Element 2</li>\n\t<li>Element 3</li>\n</ol>";

String result = Pattern
        .compile("<ol start=\"(\\d)\">")
        .matcher(text)
        .replaceAll(m -> "<ol>" + repeat("\n\t<li style=\"visibility:hidden\" />", 
                                         Integer.parseInt(m.group(1))-1));      

To repeat the string you can take the trick from here, or use a loop.

public static String repeat(String s, int n) {
    return new String(new char[n]).replace("\0", s);
}

Afterwards, result is:

<ol>
    <li style="visibility:hidden" />
    <li style="visibility:hidden" />
    <li>Element 1</li>
    <li>Element 2</li>
    <li>Element 3</li>
</ol>   

If you are stuck with an older version of Java, you can still match and replace in two steps.

Matcher m = Pattern.compile("<ol start=\"(\\d)\">").matcher(text);
while (m.find()) {
    int n = Integer.parseInt(m.group(1));
    text = text.replace("<ol start=\"" + n + "\">", 
            "<ol>" + repeat("\n\t<li style=\"visibility:hidden\" />", n-1));
}

Update by Andrea ジーティーオー:

I modified the (great) solution above for including also <ol> that have multiple attributes, so that their tag do not end with start (example, <ol> with letters, as <ol start="4" style="list-style-type: upper-alpha;">). This uses replaceAll to deal with regex as a whole.

//Take something that starts with "<ol start=", ends with ">", and has a number in between
Matcher m = Pattern.compile("<ol start=\"(\\d)\"(.*?)>").matcher(htmlString);
while (m.find()) {
    int n = Integer.parseInt(m.group(1));
    htmlString = htmlString.replaceAll("(<ol start=\"" + n + "\")(.*?)(>)",
            "<ol $2>" + StringUtils.repeat("\n\t<li style=\"visibility:hidden\" />", n - 1));
}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=116176&siteId=1