Java正则表达式拾遗

README

本文讨论了正则表达式的一些应用，由浅入深。

匹配邮箱

String mail = "[email protected]";
String reg = "[a-zA-Z0-9_]+@[a-zA-Z0-9]+(\\.[a-zA-Z]+)+";
if(mail.matches(reg))
    System.out.println("匹配成功");
else
    System.out.println("匹配失败");

利用“组”来替换代码

在IDE中，使用正则表达式替换代码，例如把C++风格的bool变量定义，替换为Java风格的变量定义：

匹配格式bool ([a-zA-Z]+)=0;，替换格式boolean $1 = false;
匹配格式bool ([a-zA-Z]+)=1;，替换格式boolean $1 = true;
效果：bool a=0;变成boolean a = false;

给所有没有加public static修饰符的变量追加这两个修饰符：

匹配格式(^[\s]+)(boolean|double|String|int|List)，替换格式$1public static $2
效果：boolean a = false;变成public static boolean a = false;

Greedy 和 Reluctant

看两个例子：

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTut3 {

    public static void main(String args[]) {
        String line = "This order was placed for QT3000! OK?"; 
        String pattern = "(.*)(\\d+)(.*)";

        // Create a Pattern object
        Pattern r = Pattern.compile(pattern);

        // Now create matcher object.
        Matcher m = r.matcher(line);

        if (m.find()) {
            System.out.println("Found value: " + m.group(0));
            System.out.println("Found value: " + m.group(1));
            System.out.println("Found value: " + m.group(2));
        } else {
            System.out.println("NO MATCH");
        }
    }

}

输出为：

Found value: This order was placed for QT3000! OK?
Found value: This order was placed for QT300
Found value: 0

表达式的第一个组(.*)用的是Greedy quantifiers，会在整体匹配时，尽可能多地匹配字符，所以在第二组(\\d+)能匹配的情况下，使第一组匹配了最多的字符。如果使用Reluctant quantifiers(.*?)，则在匹配时尽可能少地匹配字符，如下：

String line = "This order was placed for QT3000! OK?";
Pattern pattern = Pattern.compile("(.*?)(\\d+)(.*)");
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
    System.out.println("group 1: " + matcher.group(1));
    System.out.println("group 2: " + matcher.group(2));
    System.out.println("group 3: " + matcher.group(3));
}

输出结果为：

group 1: This order was placed for QT
group 2: 3000
group 3: ! OK?

find(), lookingAt(), matches()的区别

matches()最为严格，必须整个表达式匹配时，才返回true。
lookingAt()其次，从表达式字符串的首部开始匹配，只要前半部分符合，就返回true。
find()最不严格，而且可以继续查找下一个，从字符串首部（或者上一次成功查找后的下一个不匹配字符）开始查找，可以用group()取到匹配内容。

例如：

String[] names = {"sgn 11", "sgn", "3.sgn 4562.sgn"};
 Pattern pattern = Pattern.compile("sgn\\b", Pattern.CASE_INSENSITIVE);
 for (String name : names) {
     Matcher matcher = pattern.matcher(name);
     /*不同搜索内容见下*/
 }

/* matches() */
if(matcher.matches()){
    System.out.println("<" + name + "> matches: <"+matcher.group() +
            "> index of ["+matcher.start()+", "+ matcher.end() + ")");
} else{
    System.out.println("<" + name + "> does not match");
}
/* output:
<sgn 11> does not match
<sgn> matches: <sgn> index of [0, 3)
<3.sgn 4562.sgn> does not match
*/

/* lookingAt() */
if(matcher.lookingAt()){
    System.out.println("<" + name + "> lookingAt: <"+matcher.group() +
            "> index of ["+matcher.start()+", "+ matcher.end() + ")");
} else{
    System.out.println("<" + name + "> does not lookingAt");
}
/* output:
<sgn 11> lookingAt: <sgn> index of [0, 3)
<sgn> lookingAt: <sgn> index of [0, 3)
<3.sgn 4562.sgn> does not lookingAt
*/

/* find() */
while(matcher.find()){
    System.out.println(name + " find: <" + matcher.group() + 
        "> index of ["+matcher.start()+", "+ matcher.end() + ")");
}
/* output:
sgn 11 find: <sgn> index of [0, 3)
sgn find: <sgn> index of [0, 3)
3.sgn 4562.sgn find: <sgn> index of [2, 5)
3.sgn 4562.sgn find: <sgn> index of [11, 14)
*/

参考文献：

[1] Java正则表达式. http://www.runoob.com/java/java-regular-expressions.html
[2] Java Regex Capturing Groups. https://stackoverflow.com/questions/17969436/java-regex-capturing-groups