Hope someone can give me a supplement.
character |
Function and Introduction |
one, |
non-printing characters |
\cx |
Matches the control character indicated by x. For example: \cM matches a Control-M or carriage return. The x value must be one of AZ or az, otherwise, c is treated as a literal "c" character |
\f |
Matches a form feed character. Equivalent to \x0c and \cL |
\n |
Matches a newline character. Equivalent to \x0a and \cJ |
\r |
Matches a carriage return. Equivalent to \x0d and \cM |
\s |
Matches any whitespace character, including spaces, tabs, form feeds, and so on. Equivalent to [\f\n\r\t\v]. Note that Unicode regular expressions will match full-width spaces |
\S |
Matches any non-whitespace character. Equivalent to [^\f\n\r\t\v] |
\t |
Matches a tab character. Equivalent to \x09 and \cl |
\v |
Matches a vertical tab character. Equivalent to \x0b and \cK |
two, |
Special characters |
() |
Matches the start and end of a subexpression. |
. |
Matches any single character except newline \n (must have a matching character). |
[] |
Matches any one character that appears in square brackets . Such as: [bce] represents one of b, c, e. |
\ |
Mark the next character as a special character, a literal character, a backreference, or an octal escape. For example, \\n matches \n, \n matches a newline character, \\ matches \ |
{} |
Marks the start and end of a qualifier expression. |
| |
Pipeline selection, indicating that one of the two options is selected. |
\d |
Matches a numeric character. Equivalent to [0-9] |
[0-9] |
matches any number. Equivalent to \d |
\D |
Matches a non-numeric character. Equivalent to [^0-9] |
[a-z] |
matches any lowercase letter |
[A-Z] |
matches any uppercase letter |
[a-zA-Z0-9] |
matches any letter and number |
\w |
Matches any word character that contains an underscore. Equivalent to [A-Za-z0-9_] |
\W |
Matches any non-word character. Equivalent to [^A-Za-z0-9_] |
[\u4e00-\u9fa5] |
Match pure Chinese |
three, |
locator |
* |
Matches the preceding subexpression zero or more times |
+ |
Matches the preceding subexpression one or more times |
? |
Matches the preceding subexpression zero or one time, or specifies a non-greedy qualifier. By placing ? after the *, +, or ? qualifiers, the expression is converted from a "greedy" expression to a "non-greedy" expression or minimal match. |
{n} |
n is a non-negative integer. Matches the preceding subexpression n times |
{n,} |
n is a non-negative integer. Matches the preceding subexpression at least n times |
{n,m} |
Both m and n are non-negative integers, where n≤m. Matches the preceding subexpression at least n and at most m times. Note that there can be no spaces between the comma and n, m |
Four, |
qualifier |
^ |
The start position of the matching string, such as: ^h matches the beginning of h; when it is in the square bracket expression, it means that the character set is not accepted, for example [^0-9] matches characters other than numbers |
$ |
match the end of the string |
\b |
Matches a word boundary, that is, the position of a word and a space |
\B |
Non-word boundary matching |
5. Modifier:
Modifier: i, the variable provided by the re module: re.I, makes the mode case-insensitive, that is, case-insensitive.
Modifier: m, the variable provided by the re module: re.M, so that the mode can have multiple line headers and line positions in multi-line text, affecting ^ and $.
Modifier: s, the variable provided by the re module: re.S, let the wildcard character . represent any character (including the newline character \n).