Java正则表达式拾遗

README

本文讨论了正则表达式的一些应用,由浅入深。

匹配邮箱

String mail = "[email protected]";
String reg = "[a-zA-Z0-9_]+@[a-zA-Z0-9]+(\\.[a-zA-Z]+)+";
if(mail.matches(reg))
    System.out.println("匹配成功");
else
    System.out.println("匹配失败");


利用“组”来替换代码

在IDE中,使用正则表达式替换代码,例如把C++风格的bool变量定义,替换为Java风格的变量定义:

  • 匹配格式bool ([a-zA-Z]+)=0;,替换格式boolean $1 = false;
  • 匹配格式bool ([a-zA-Z]+)=1;,替换格式boolean $1 = true;
  • 效果:bool a=0;变成boolean a = false;

给所有没有加public static修饰符的变量追加这两个修饰符:

  • 匹配格式(^[\s]+)(boolean|double|String|int|List),替换格式$1public static $2
  • 效果:boolean a = false;变成public static boolean a = false;

Greedy 和 Reluctant

看两个例子:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTut3 {

    public static void main(String args[]) {
        String line = "This order was placed for QT3000! OK?"; 
        String pattern = "(.*)(\\d+)(.*)";

        // Create a Pattern object
        Pattern r = Pattern.compile(pattern);

        // Now create matcher object.
        Matcher m = r.matcher(line);

        if (m.find()) {
            System.out.println("Found value: " + m.group(0));
            System.out.println("Found value: " + m.group(1));
            System.out.println("Found value: " + m.group(2));
        } else {
            System.out.println("NO MATCH");
        }
    }

}

输出为:

Found value: This order was placed for QT3000! OK?
Found value: This order was placed for QT300
Found value: 0

表达式的第一个组(.*)用的是Greedy quantifiers,会在整体匹配时,尽可能多地匹配字符,所以在第二组(\\d+)能匹配的情况下,使第一组匹配了最多的字符。如果使用Reluctant quantifiers(.*?),则在匹配时尽可能少地匹配字符,如下:

String line = "This order was placed for QT3000! OK?";
Pattern pattern = Pattern.compile("(.*?)(\\d+)(.*)");
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
    System.out.println("group 1: " + matcher.group(1));
    System.out.println("group 2: " + matcher.group(2));
    System.out.println("group 3: " + matcher.group(3));
}

输出结果为:

group 1: This order was placed for QT
group 2: 3000
group 3: ! OK?


find(), lookingAt(), matches()的区别

  • matches()最为严格,必须整个表达式匹配时,才返回true。
  • lookingAt()其次,从表达式字符串的首部开始匹配,只要前半部分符合,就返回true。
  • find()最不严格,而且可以继续查找下一个,从字符串首部(或者上一次成功查找后的下一个不匹配字符)开始查找,可以用group()取到匹配内容。

例如:

String[] names = {"sgn 11", "sgn", "3.sgn 4562.sgn"};
 Pattern pattern = Pattern.compile("sgn\\b", Pattern.CASE_INSENSITIVE);
 for (String name : names) {
     Matcher matcher = pattern.matcher(name);
     /*不同搜索内容见下*/
 }
/* matches() */
if(matcher.matches()){
    System.out.println("<" + name + "> matches: <"+matcher.group() +
            "> index of ["+matcher.start()+", "+ matcher.end() + ")");
} else{
    System.out.println("<" + name + "> does not match");
}
/* output:
<sgn 11> does not match
<sgn> matches: <sgn> index of [0, 3)
<3.sgn 4562.sgn> does not match
*/
/* lookingAt() */
if(matcher.lookingAt()){
    System.out.println("<" + name + "> lookingAt: <"+matcher.group() +
            "> index of ["+matcher.start()+", "+ matcher.end() + ")");
} else{
    System.out.println("<" + name + "> does not lookingAt");
}
/* output:
<sgn 11> lookingAt: <sgn> index of [0, 3)
<sgn> lookingAt: <sgn> index of [0, 3)
<3.sgn 4562.sgn> does not lookingAt
*/
/* find() */
while(matcher.find()){
    System.out.println(name + " find: <" + matcher.group() + 
        "> index of ["+matcher.start()+", "+ matcher.end() + ")");
}
/* output:
sgn 11 find: <sgn> index of [0, 3)
sgn find: <sgn> index of [0, 3)
3.sgn 4562.sgn find: <sgn> index of [2, 5)
3.sgn 4562.sgn find: <sgn> index of [11, 14)
*/


参考文献:

[1] Java正则表达式. http://www.runoob.com/java/java-regular-expressions.html
[2] Java Regex Capturing Groups. https://stackoverflow.com/questions/17969436/java-regex-capturing-groups

猜你喜欢

转载自blog.csdn.net/weixin_40255793/article/details/79584121