README
本文讨论了正则表达式的一些应用,由浅入深。
匹配邮箱
String mail = "[email protected]";
String reg = "[a-zA-Z0-9_]+@[a-zA-Z0-9]+(\\.[a-zA-Z]+)+";
if(mail.matches(reg))
System.out.println("匹配成功");
else
System.out.println("匹配失败");
利用“组”来替换代码
在IDE中,使用正则表达式替换代码,例如把C++风格的bool变量定义,替换为Java风格的变量定义:
- 匹配格式
bool ([a-zA-Z]+)=0;
,替换格式boolean $1 = false;
- 匹配格式
bool ([a-zA-Z]+)=1;
,替换格式boolean $1 = true;
- 效果:
bool a=0;
变成boolean a = false;
给所有没有加public static修饰符的变量追加这两个修饰符:
- 匹配格式
(^[\s]+)(boolean|double|String|int|List)
,替换格式$1public static $2
- 效果:
boolean a = false;
变成public static boolean a = false;
Greedy 和 Reluctant
看两个例子:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTut3 {
public static void main(String args[]) {
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println("Found value: " + m.group(0));
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
} else {
System.out.println("NO MATCH");
}
}
}
输出为:
Found value: This order was placed for QT3000! OK?
Found value: This order was placed for QT300
Found value: 0
表达式的第一个组(.*)
用的是Greedy quantifiers,会在整体匹配时,尽可能多地匹配字符,所以在第二组(\\d+)
能匹配的情况下,使第一组匹配了最多的字符。如果使用Reluctant quantifiers(.*?)
,则在匹配时尽可能少地匹配字符,如下:
String line = "This order was placed for QT3000! OK?";
Pattern pattern = Pattern.compile("(.*?)(\\d+)(.*)");
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
System.out.println("group 1: " + matcher.group(1));
System.out.println("group 2: " + matcher.group(2));
System.out.println("group 3: " + matcher.group(3));
}
输出结果为:
group 1: This order was placed for QT
group 2: 3000
group 3: ! OK?
find(), lookingAt(), matches()的区别
matches()
最为严格,必须整个表达式匹配时,才返回true。lookingAt()
其次,从表达式字符串的首部开始匹配,只要前半部分符合,就返回true。find()
最不严格,而且可以继续查找下一个,从字符串首部(或者上一次成功查找后的下一个不匹配字符)开始查找,可以用group()
取到匹配内容。
例如:
String[] names = {"sgn 11", "sgn", "3.sgn 4562.sgn"};
Pattern pattern = Pattern.compile("sgn\\b", Pattern.CASE_INSENSITIVE);
for (String name : names) {
Matcher matcher = pattern.matcher(name);
/*不同搜索内容见下*/
}
/* matches() */
if(matcher.matches()){
System.out.println("<" + name + "> matches: <"+matcher.group() +
"> index of ["+matcher.start()+", "+ matcher.end() + ")");
} else{
System.out.println("<" + name + "> does not match");
}
/* output:
<sgn 11> does not match
<sgn> matches: <sgn> index of [0, 3)
<3.sgn 4562.sgn> does not match
*/
/* lookingAt() */
if(matcher.lookingAt()){
System.out.println("<" + name + "> lookingAt: <"+matcher.group() +
"> index of ["+matcher.start()+", "+ matcher.end() + ")");
} else{
System.out.println("<" + name + "> does not lookingAt");
}
/* output:
<sgn 11> lookingAt: <sgn> index of [0, 3)
<sgn> lookingAt: <sgn> index of [0, 3)
<3.sgn 4562.sgn> does not lookingAt
*/
/* find() */
while(matcher.find()){
System.out.println(name + " find: <" + matcher.group() +
"> index of ["+matcher.start()+", "+ matcher.end() + ")");
}
/* output:
sgn 11 find: <sgn> index of [0, 3)
sgn find: <sgn> index of [0, 3)
3.sgn 4562.sgn find: <sgn> index of [2, 5)
3.sgn 4562.sgn find: <sgn> index of [11, 14)
*/
参考文献:
[1] Java正则表达式. http://www.runoob.com/java/java-regular-expressions.html
[2] Java Regex Capturing Groups. https://stackoverflow.com/questions/17969436/java-regex-capturing-groups