Java regex to replace in between text using a pattern

Rose :

I am a newbie to Java regex. I have a long string which contains text like this(Below is only the part of my string which I would like to replace):

href="javascript:openWin('Images/DCRMBex_01B_ex01.jpg',480,640)"
href="javascript:openWin('Images/DCRMBex_01A_ex01.jpg',480,640)"
href="javascript:openWin('Images/DCRMBex_06A_ex06.jpg',480,640)"

I would like to replace

Images

with

http://google.com/Images

For eg. my output should look like this:

href="javascript:openWin('http://google.com/Images/DCRMBex_01B_ex01.jpg',480,640)"

Below is my Java program:

import java.io.FileReader;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main2 {

    public static void main(String[] args) throws FileNotFoundException {

        Scanner in = new Scanner(new FileReader("C:\\Projects\\input.txt"));

        StringBuilder sb = new StringBuilder();
        while (in.hasNext()) {
            sb.append(in.next());
        }
        String patternString = "href=\"javascript:openWin(.+?)\"";
        Pattern pattern = Pattern.compile(patternString);
        Matcher matcher = pattern.matcher(sb);
        while (matcher.find()) {
            //System.out.println(matcher.group(1));
            //System.out.println(matcher.group(1).replaceAll("Images", "http://google.com/Images"));
            matcher.group(1).replaceAll("Images", "http://google.com/Images");

        }
        System.out.println(sb);
    }
}

Below is my input file(input.txt). This is only a part of my file. The file is too long to paste here:

 <td valign="top"><a href="http://www.google.com/cds/desktop/documents/DCRMBex/DCRMBex_01_ex01.pdf"><b>Example 1: Bible (Rusch)</b></a> � <a href="javascript:openWin('Images/DCRMBex_01A_ex01.jpg',480,640)">Figure 1A. First page of text</a> � <a href="javascript:openWin('Images/DCRMBex_01B_ex01.jpg',480,640)">Figure 1B. Source of supplied title</a></td>
                            <td valign="top">  </td>
                            <td valign="top"><a href="http://www.google.com/cds/desktop/documents/DCRMBex/DCRMBex_06_ex06.pdf"><b>Example 6: Angelo Carletti</b></a> � <a href="javascript:openWin('Images/DCRMBex_06A_ex06.jpg',480,640)">Figure 6A. Title page</a> � <a href="javascript:openWin('Images/DCRMBex_06B_ex06.jpg',480,640)">Figure 6B. Colophon showing use of i/j and u/v</a></td>
                          </tr>
                          <tr>
                            <td valign="top"><a href="http://www.google.com/cds/desktop/documents/DCRMBex/DCRMBex_02_ex02.pdf"><b>Example 2: Greek anthology</b></a> � <a href="javascript:openWin('Images/DCRMBex_02A_ex02.jpg',480,640)">Figure 2A. First page of text</a> � <a href="javascript:openWin('Images/DCRMBex_02B_ex02.jpg',480,640)">Figure 2B. Colophon</a></td>
                            <td valign="top">  </td>
                            <td valign="top"><a href="http://www.google.com/cds/desktop/documents/DCRMBex/DCRMBex_07_ex07.pdf"><b>Example 7: Erasmus</b></a> � <a href="javascript:openWin('Images/DCRMBex_07A_ex07.jpg',480,640)">Figure 7A. Title page</a> � <a href="javascript:openWin('Images/DCRMBex_07B_ex07.jpg',480,640)">Figure 7B. Colophon</a> � <a href="javascript:openWin('Images/DCRMBex_07C_ex07.jpg',640,480)">Figure 7C. Running title</a></td>
                          </tr>
                          <tr>
                            <td valign="top"><a href="http://www.google.com/cds/desktop/documents/DCRMBex/DCRMBex_03_ex03.pdf"><b>Example 3: Heytesbury</b></a> � <a href="javascript:openWin('Images/DCRMBex_03A_ex03.jpg',480,640)">Figure 3A. Title page</a> � <a href="javascript:openWin('Images/DCRMBex_03B_ex03.jpg',480,640)">Figure 3B. Colophon showing use of i/j and u/v</a></td>
                            <td valign="top">  </td>
                            <td valign="top"><a href="http://www.google.com/cds/desktop/documents/DCRMBex/DCRMBex_08_ex08.pdf"><b>Example 8: Pliny</b></a> � <a href="javascript:openWin('Images/DCRMBex_08A_ex08.jpg',480,640)">Figure 8A. Title page</a> � <a href="javascript:openWin('Images/DCRMBex_08B_ex08.jpg',480,640)">Figure 8B. Colophon</a></td>

Output:

1) System.out.println(matcher.group(1))

('Images/DCRMBex_05_ex05.jpg',480,640)

2)System.out.println(matcher.group(1).replaceAll("Images","http://google.com/Images"));

 ('http://google.com/Images/DCRMBex_05_ex05.jpg',480,640)

But when I print my struingbuilder, it doesn't show any replacement. What I am doing wrong here? Any help is appreciated. Thanks

Samuel Philipp :

I would recommend using Files.lines() and Java Steam to modify the input. With your actual input you also don't need a regex:

try (Stream<String> lines = Files.lines(Paths.get("input.txt"))) {
    String result = lines
            .map(line -> line.replace("Images", "http://google.com/Images"))
            .collect(Collectors.joining("\n"));
    System.out.println(result);
}

If you really want to use a regex I would recommend to use a pattern outside the loop, because String.replaceAll() internally compiles the pattern every time you call it. So the performance is much better if you do not do Pattern.compile() for each line:

Pattern pattern = Pattern.compile("(href=\"javascript:openWin.*)(Images.*\")");
try (Stream<String> lines = Files.lines(Paths.get("input.txt"))) {
    String result = lines
            .map(pattern::matcher)
            .map(matcher -> matcher.replaceAll("$1http://google.com/$2"))
            .collect(Collectors.joining("\n"));
    System.out.println(result);
}

Using this regex for replacement it will create two groups (between ()). You can use this groups in your replacement string by using $index. So $1 will insert the first group.

The result in both cases will be:

href="javascript:openWin(&amp;#39;http://google.com/Images/DCRMBex_01B_ex01.jpg&amp;#39;,480,640)"
href="javascript:openWin(&amp;#39;http://google.com/Images/DCRMBex_01A_ex01.jpg&amp;#39;,480,640)"
href="javascript:openWin(&amp;#39;http://google.com/Images/DCRMBex_06A_ex06.jpg&amp;#39;,480,640)"

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=92601&siteId=1