Python regular expression character summary (update 2)

Hope someone can give me a supplement.

character

Function and Introduction

one,

non-printing characters

\cx

Matches the control character indicated by x. For example: \cM matches a Control-M or carriage return. The x value must be one of AZ or az, otherwise, c is treated as a literal "c" character

\f

Matches a form feed character. Equivalent to \x0c and \cL

\n

Matches a newline character. Equivalent to \x0a and \cJ

\r

Matches a carriage return. Equivalent to \x0d and \cM

\s

Matches any whitespace character, including spaces, tabs, form feeds, and so on. Equivalent to [\f\n\r\t\v]. Note that Unicode regular expressions will match full-width spaces

\S

Matches any non-whitespace character. Equivalent to [^\f\n\r\t\v]

\t

Matches a tab character. Equivalent to \x09 and \cl

\v

Matches a vertical tab character. Equivalent to \x0b and \cK

two,

Special characters

()

Matches the start and end of a subexpression.

.

Matches any single character except newline \n (must have a matching character).

[]

Matches any one character that appears in square brackets . Such as: [bce] represents one of b, c, e.

\

Mark the next character as a special character, a literal character, a backreference, or an octal escape. For example, \\n  matches \n, \n matches a newline character, \\ matches \

{}

Marks the start and end of a qualifier expression.

|

Pipeline selection, indicating that one of the two options is selected.

\d

Matches a numeric character. Equivalent to [0-9]

[0-9]

matches any number. Equivalent to \d

\D

Matches a non-numeric character. Equivalent to [^0-9]

[a-z]

matches any lowercase letter

[A-Z]

matches any uppercase letter

[a-zA-Z0-9]

matches any letter and number

\w

Matches any word character that contains an underscore. Equivalent to [A-Za-z0-9_]

\W

Matches any non-word character. Equivalent to [^A-Za-z0-9_]

[\u4e00-\u9fa5]

Match pure Chinese

three,

locator

*

Matches the preceding subexpression zero or more times

+

Matches the preceding subexpression one or more times

?

Matches the preceding subexpression zero or one time, or specifies a non-greedy qualifier.

By placing ? after the *, +, or ? qualifiers, the expression is converted from a "greedy" expression to a "non-greedy" expression or minimal match.

{n}

n is a non-negative integer. Matches the preceding subexpression n times

{n,}

n is a non-negative integer. Matches the preceding subexpression at least n times

{n,m}

Both m and n are non-negative integers, where n≤m. Matches the preceding subexpression at least n and at most m times. Note that there can be no spaces between the comma and n, m

Four,

qualifier

^

The start position of the matching string, such as: ^h matches the beginning of h; when it is in the square bracket expression, it means that the character set is not accepted, for example [^0-9] matches characters other than numbers

$

match the end of the string

\b

Matches a word boundary, that is, the position of a word and a space

\B

Non-word boundary matching 

5. Modifier:

Modifier: i, the variable provided by the re module: re.I, makes the mode case-insensitive, that is, case-insensitive.

Modifier: m, the variable provided by the re module: re.M, so that the mode can have multiple line headers and line positions in multi-line text, affecting ^ and $.

Modifier: s, the variable provided by the re module: re.S, let the wildcard character . represent any character (including the newline character \n).

おすすめ

転載: blog.csdn.net/any1where/article/details/129135498