Regular Expressions in Java SE Six: Matching Rules

Notes:
	1. The character set represented by [] does not indicate that it contains
	  () table capture and grouping
	  Eg:
	  	It can only be one of QQ mailbox/GMail mailbox/Outlook mailbox/163 mailbox
	  	(qq|gmail|163){1}\.com and [(qq)|(gmail)|(163)]\.com; the latter is wrong.

construct match
 
character
x character x
\\ backslash character
\0n character n with octal value 0 (0 <= n <= 7)
\0nn Character nn with octal value 0 (0 <= n <= 7)
\0mnn Character mnn with octal value 0 (0 <= m <= 3, 0 <= n <= 7)
\xhh character hh with hex value 0x
\uhhhh character hhhh with hex value 0x
\t Tab character ('\u0009')
\n New line (newline) character ('\u000A')
\r carriage return ('\u000D')
\f form feed ('\u000C')
\a alarm (bell) character ('\u0007')
\e escape character ('\u001B')
\cx corresponds to the control character of x
 
character class
[abc] a, b, or c (simple class)
[^abc] Any character except a, b, or c (negation)
[a-zA-Z] a to z or A to Z inclusive (range)
[ad[mp]] a to d or m to p: [a-dm-p] (union)
[az&&[def]] d, e, or f (intersection)
[az&&[^bc]] a to z, except b and c: [ad-z] (subtract)
[az&&[^mp]] a to z, not m to p: [a-lq-z] (subtract)
 
predefined character classes
. any character (may or may not match line terminator)
\d Digits: [0-9]
\D non-digit: [^0-9]
\s whitespace character: [ \t\n\x0B\f\r]
\S non-whitespace characters: [^\s]
\w word character: [a-zA-Z_0-9]
\W non-word character: [^\w]
 
POSIX character classes (US-ASCII only)
\p{Lower} Lowercase alphabetic characters: [az]
\p{Upper} Uppercase alphabetic characters: [AZ]
\p{ASCII} All ASCII: [\x00-\x7F]
\p{Alpha} alphabetic characters: [\p{Lower}\p{Upper}]
\p{Digit} Decimal digits: [0-9]
\p{Alnum} Alphanumeric characters: [\p{Alpha}\p{Digit}]
\p{Punct} Punctuation: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
\p{Graph} Visible characters: [\p{Alnum}\p{Punct}]
\p{Print} printable characters: [\p{Graph}\x20]
\p{Blank} space or tab: [ \t]
\p{Cntrl} Control characters: [\x00-\x1F\x7F]
\p{XDigit} Hex digits: [0-9a-fA-F]
\p{Space} whitespace character: [ \t\n\x0B\f\r]
 
java.lang.Character class (simple java character type)
\p{javaLowerCase} is equivalent to java.lang.Character.isLowerCase()
\p{javaUpperCase} is equivalent to java.lang.Character.isUpperCase()
\p{javaWhitespace} is equivalent to java.lang.Character.isWhitespace()
\p{javaMirrored} is equivalent to java.lang.Character.isMirrored()
 
Classes for Unicode blocks and categories
\p{InGreek} characters in Greek blocks (simple blocks)
\p{Lu} capital letters (simple category)
\p{Sc} currency symbol
\P{InGreek} all characters except in Greek block (negation)
[\p{L}&&[^\p{Lu}]] All letters except uppercase (minus)
 
Boundary matcher
^ start of line
$ end of line
\b word boundary
\B non-word boundary
\A start of input
\G end of previous match
\Z End of input, used only for final terminator (if any)
\z end of input
 
Greedy Quantifier
X? X, once or not at all
X* X, zero or more times
X+X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n times, but not more than m times
 
Reluctant Quantifier
X?? X, once or not at all
X*? X, zero or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n times, but not more than m times
 
Possessive quantifier
X? + X, once or not at all
X*+ X, zero or more times
X++ X, one or more times
X{n}+ X, exactly n times
X{n,}+ X, at least n times
X{n,m}+ X, at least n times, but not more than m times
 
Logical operator
XY X followed by Y
X|Y X 或 Y
(X) X, as a capturing group
 
Back Quote
\n any matching nth capturing group
 
quote
\Nothing, but quotes the following characters
\Q Nothing, but quotes all characters until \E
\E Nothing, but ends quoting starting with \Q
 
special construct (non-capturing)
(?:X) X, as a non-capturing group
(?idmsux-idmsux) Nothing, but will match the flag idmsux on - off
(?idmsux-idmsux:X) X, as idmsux with the given flags on - off
(?=X) X, through a positive lookahead of zero width
(?!X) X, via a negative lookahead of zero width
(?<=X) X, positive lookbehind through zero width
(?<!X) X, through a negative lookbehind of zero width
(?>X) X, as an independent non-capturing group

  

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325063266&siteId=291194637