Recently did something reptiles match, incidentally brushed up regular expressions. Finishing demo with java.
A single character match
1, matches any character
Here we use a point to match any character that is (.)
System.out.println("abc".matches("a.c"));
System.out.println("a$c".matches("a.c"));
System.out.println("acc".matches("a.c"));
The output here is true.
2, matching numbers
Used herein \ d to match, where \\ d above a first slash is escaped intention \ d. The above result is true, the following is false.
System.out.println("007".matches("00\\d"));
System.out.println("00a".matches("00\\d"));
3, matching commonly used characters
Here it is a commonly used characters refer to the letters , numbers or underscores. With \ w, he said there is true.
System.out.println("javac".matches("java\\w"));
System.out.println("java_".matches("java\\w"));
System.out.println("java1".matches("java\\w"));
4, matching space character
With \ s matches a space character, not only the attention space characters including spaces, further comprising a tab character (in Java by \ t denotes)
System.out.println("java ".matches("java\\s"));
5, the non-matching digital
With \ d matches a number, and \ D is a non-matching number. For example, 00 \ D matches.
Matching rules
Regular Expressions |
rule |
Can match |
A |
Specified character |
A |
\u548c |
Specifies the Unicode character |
with |
. |
Any character |
a,b,&,0 |
\d |
Numbers 0 to 9 |
0~9 |
\w |
Uppercase and lowercase letters, numbers and underscores |
a~z,A~Z,0~9,_ |
\s |
Space, Tab key |
Space, Tab |
\D |
Non-numeric |
a,A,&,_,…… |
\W |
Non \ w |
&,@,in,…… |
\S |
Non \ s |
a,A,&,_,…… |
Second, the repeat match
* The use of modifiers can match any character including 0 characters.
System.out.println("java".matches("java\\d*"));
System.out.println("java1".matches("java\\d*"));
System.out.println("java11".matches("java\\d*"));
System.out.println("java123".matches("java\\d*"));
Above it is true.
+ Using modifiers can match at least one character.
The use of modifier? Matches zero or one character.
Specify exactly n characters
System.out.println("java123".matches("java\\d{3}"));
System.out.println("java1".matches("java\\d{3}"));
Matching rules
Regular Expressions |
rule |
Can match |
A* |
Any number of characters |
空,A,AA,AAA,…… |
A+ |
At least one character |
A,AA,AAA,…… |
A? |
0 or 1 characters |
Empty, A |
A{3} |
Specifies the number of characters |
AAA |
A{2,3} |
The number of characters specified range |
AA, AAA |
A{2,} |
At least n characters |
AA, AAA, AAAA, ...... |
A{0,3} |
Up to n characters |
空,A,AA,AAA |
Third, complicated match
(1) beginning and end of match
Represents the beginning of a ^, $ represents the end. For example, ^ A \ d {3} $, matches "A001", "A380".
(2) matches the specified range
May be used in the matching characters within the scope of the [...], [123456789] \ d {6,7}, [1-9], [0-9a-fA-F], [^ 1-9] {3}
(3) rule matching or
With | two regular rule or rules are connected, for example, AB | CD represent can match AB or CD.
(4) Use parentheses
String re = "learn\\s(java|php|go)";
System.out.println("learn java".matches(re));
System.out.println("learn Java".matches(re));
System.out.println("learn php".matches(re));
System.out.println("learn Go".matches(re));
Regular Expressions |
rule |
Can match |
^ |
beginning |
Beginning of a string |
$ |
end |
End of the string |
[ABC] |
[...] an arbitrary character |
A,B,C |
[A-F0-9xy] |
Specified range of characters |
A,……,F,0,……,9,x,y |
[^A-F] |
Any characters outside the specified range |
Non-A ~ F |
AB | CD | EF |
AB or CD or EF |
AB, CD, EF |
Fourth, the grouping match
Packet matches very important point is taken substring.
Pattern p = Pattern.compile("(\\d{3,4})\\-(\\d{7,8})");
Matcher m = p.matcher("010-12345678");
if (m.matches()) {
String g1 = m.group(1);
String g2 = m.group(2);
System.out.println(g1);
System.out.println(g2);
} else {
System.out.println("匹配失败!");
}
Important to note that the parameter Matcher.group (index) by Method 1 for the first substring, 2 denotes a second sub-string. What if we pass 0'll get it? The answer is 010-12345678, regular expression code that is matched to the entire regular strings are used in the preceding code is String.matches () method, which we used in the code is extracted packet java.util. regex package inside Pattern Matcher class and classes. In fact the two codes are essentially the same, as is the way Pattern Matcher class and internal String.matches () method call.
But repeated use String.matches () many times less efficient match for a regular expression with, because every time create the same Pattern object. Can create a Pattern object, then repeated use, it can be achieved compiled once, multiple matches:
Pattern pattern = Pattern.compile("(\\d{3,4})\\-(\\d{7,8})");
pattern.matcher("010-12345678").matches(); // true
pattern.matcher("021-123456").matches(); // true
pattern.matcher("022#1234567").matches(); // false
// 获得Matcher对象:
Matcher matcher = pattern.matcher("010-12345678");
if (matcher.matches()) {
String whole = matcher.group(0); // "010-12345678", 0表示匹配的整个字符串
String area = matcher.group(1); // "010", 1表示匹配的第1个子串
String tel = matcher.group(2); // "12345678", 2表示匹配的第2个子串
System.out.println(area);
System.out.println(tel);
}
Non-greedy match:
1230000 string matching e.g., I want to intercept the front 123 and back piece are two strings of 0:
If "(\\ d +) (0 *)" to match will be in the \ d When complete matching 1230000 all match, leading to the back of the substring is "." This is the greedy match. Use +? To complete a non-greedy match.
Pattern pattern = Pattern.compile("(\\d+?)(0*)");
Matcher matcher = pattern.matcher("1230000");
if (matcher.matches()) {
System.out.println("group1=" + matcher.group(1)); // "123"
System.out.println("group2=" + matcher.group(2)); // "0000"
}
Fifth, the Search and Replace
Regular expression matching can be completed and the replacement string.
(1) dividing the string
"a b c".split("\\s"); // { "a", "b", "c" }
"a b c".split("\\s"); // { "a", "b", "", "c" }
"a, b ;; c".split("[\\,\\;\\s]+"); // { "a", "b", "c" }
(2) search string
String s = "the quick brown fox jumps over the lazy dog.";
Pattern p = Pattern.compile("\\wo\\w");
Matcher m = p.matcher(s);
while (m.find()) {
String sub = s.substring(m.start(), m.end());
System.out.println(sub);
}
(3) replacement string
String s = "The quick\t\t brown fox jumps over the lazy dog.";
String r = s.replaceAll("\\s+", " ");
System.out.println(r); // "The quick brown fox jumps over the lazy dog."
(4) back references
String s = "the quick brown fox jumps over the lazy dog.";
String r = s.replaceAll("\\s([a-z]{4})\\s", " <b>$1</b> ");
System.out.println(r);