For some simple strings, the String method can be used directly, but for complex ones, the String method is not enough. At this time, regular expressions are the solution!
One: Regular expression matching rules
character | |
B | Specify character B |
\xhh | character with hexadecimal value oxhh |
\uhhhh | Unicode characters in hexadecimal representation as oxhhh |
\t | Tab Tab |
\n | newline |
\r | Enter |
\f | form feed |
\e | Escape |
character class | |
. | any character |
[abc] | Any character containing a, b, and c (same as a|b|c) |
[^abc] | any character except a, b, and c (negation) |
[a-zA-Z] | Any character (range) from a to z or from A to Z |
[abc[he]] | Any a, b, c, h, i and j characters (same as a|b|c|h|i|j) (combined) |
[a-z&&[hij]] | any h, i or j (intersection) |
\s | Whitespace (Space, Tab, Line Feed, Form Feed, and Carriage Return) |
\S | non-whitespace ([^\s]) |
\d | number[0-9] |
\D | non-numeric[^0-9] |
\w | word character [a-zA-Z0-9] |
\W | non-word characters [^\w] |
logical operator | |
XY | Y follows X |
X|Y | X or Y |
(X) | Capturing group. The ith capturing group can be referenced with \i in an expression |
boundary matcher | |
^ | start of a line |
$ | end of line |
\b | word boundaries |
\B | non-word boundaries |
\G | end of previous match |
quantifier
Quantifiers describe the way a pattern absorbs input text:
Greedy: Quantifiers are always greedy unless other options are set. Greedy expressions find as many matches as possible for all possible patterns. A typical reason for this problem is to assume that our pattern can only match the first possible character group, and if it is greedy, it will continue to match.
Reluctant: Specified with a question mark, this quantifier matches the minimum number of characters required to satisfy the pattern. Hence also called lazy, least-matching, non-greedy, or non-greedy.
占有型:目前,这种类型的量词只有Java语言中才可用。当正则表达式被应用于字符串时,它会产生相当多的状态,以便在匹配失败时可以回溯。而“占有的”量词并不保存这些中间状态,因此它们可以防止回溯。它们常常用于防止正则表达式失控,因此可以使正则表达式执行起来更有效。
贪婪型 | 勉强型 | 占有型 | 如何匹配 |
X? | X?? | X?+ | 一个或零个X |
X* | X*? | X*+ | 零个或多个X |
X+ | X+? | X++ | 一个或多个X |
X{n} | X{n}? | X{n}+ | 恰好n次X |
X{n,} | X{n,}? | X{n,}+ | 至少n次X |
X{n,m} | X{n,m}? | X{n,m}+ | X至少n次,且不超过m次 |
需要注意的时在书写表达式X的时候,要用()括起来:例如X=abc X?应该表示为 (abc)?