Python regular python regular expression (1) - special characters

 

python regular expression (1) - special characters

 

Regular expressions - expressions special meaning

Regular expressions letters and numerals of their own, but most of the letters and numbers will have a different meaning when a backslash before. 

The following lists the regular expression pattern syntax of special elements.

 

1. Ordinary Character Set

1) \ w match alphanumeric and underscores

2) \ W matches non-alphanumeric and underscore

3) \ s matches any whitespace character, equivalent to [\ t \ n \ r \ f].

4) \ S Matches any non-whitespace characters

5) \ d match any number, is equivalent to [0-9]

6) \ D matches any non-numeric

7) \ 1 ... \ 9 matches the n-th packet.

8) [a-zA-Z0-9] matches any letters and numbers

 

2. Number of characters

Followed by the characters or character groups (...), the default greedy matching, if you want to match the greedy suppressed, after the number of characters to be added, for example:? \ W +?

1). Matches any character except newline, when re.DOTALL flag is specified, any character can match newline comprising

2) * Matches the preceding character zero or more times

3) + Matches the preceding character one or more times

4)? Matches the preceding character 0 or 1 times

5) {m} m times prior to match a character

6) {m, n} m to the former matches a character n times

7) {m,} character before a match at least m times

8) {, n} former matches a character 0 to n times, most n

 

3. Boundary matchers 

1) ^ start of the string, if a plurality of lines of the beginning of each line match

2) [^] in [...], and ^ represents a negative, non-alphabetic [^ a-zA-Z], non-numeric [^ 0-9]

3) $ end of the string or line, if a multi-line pattern matching, the end of each line

4) \ A only matches the beginning of the string, with ^

5) \ b matches a word boundary, that is, refers to the location and the space between words

6) \ B is equivalent to [^ \ b] represents a non-word boundary matching

7) \ Z matches the end of the string, if the wrap is present, only the front end of the string to match the line feed.

8) \ z matching string ends

 

4. Logical matcher

1) | (or) 

    Matching | about any of the regular expression, if the expression on the left, the matching ending, no longer matches the regular expression to the right of the symbol in general () is used, if not then its range is in parentheses the entire regular expression

2) Packet (...) 

    After the reference, using () enclosed regular expressions will be used as a packet, counting from the left side of the expression sequentially n, the number of left parenthesis '(', there are that many packets, packets from a sequence encoding plus 1, both brackets nested parentheses, and the packet expression as a whole, can be accessed after the quantifiers.

3)  \<number> 

    String reference packet matches the packet number of <number> as: \ 1 ... \ 9

4)  (?P<name>...) 

    Named group, in addition to the default packet number packet re-assign aliases

    Note: P is capitalized

5)  (?P=name) 

    Alias ​​name matching reference packet, this is referenced in the regular expression, which matches string repetition, reference numbers may be used.

    Note: P is capitalized

 

5. Special matcher

? 1) (imx) regex includes three optional flags: i, m, or x. It affects only the area parentheses.

2) (? -Imx) regex off i, m, x or optional flag. It affects only the area parentheses.

3) (:? ...) within the parentheses to match the string does not as a group

4) (?! Pattern) before the negative assertion syntax, it represents the beginning of the negative

          Can only be used at the beginning of regular expressions, pattern matching mode, it does not need to match the contents of the back of the regular expression match is successful only

5) (? <! Pattern) after the negative assertion syntax, it represents the end of a negative

           Content in front of the pattern does not match the required pattern before a successful match

6) (? = Pattern) before the assertion syntax to be sure.

          Pattren need to match the pattern to match successfully, certainly indicates that the character content

7) (? <= Pattern) after the affirmative assertion syntax

         Pattern matching pattern matching need to succeed, certainly represent the characters behind the content

8) (? # ...) # behind the content will be treated as comments and ignored

Regular expressions - expressions special meaning

Regular expressions letters and numerals of their own, but most of the letters and numbers will have a different meaning when a backslash before. 

The following lists the regular expression pattern syntax of special elements.

 

1. Ordinary Character Set

1) \ w match alphanumeric and underscores

2) \ W matches non-alphanumeric and underscore

3) \ s matches any whitespace character, equivalent to [\ t \ n \ r \ f].

4) \ S Matches any non-whitespace characters

5) \ d match any number, is equivalent to [0-9]

6) \ D matches any non-numeric

7) \ 1 ... \ 9 matches the n-th packet.

8) [a-zA-Z0-9] matches any letters and numbers

 

2. Number of characters

Followed by the characters or character groups (...), the default greedy matching, if you want to match the greedy suppressed, after the number of characters to be added, for example:? \ W +?

1). Matches any character except newline, when re.DOTALL flag is specified, any character can match newline comprising

2) * Matches the preceding character zero or more times

3) + Matches the preceding character one or more times

4)? Matches the preceding character 0 or 1 times

5) {m} m times prior to match a character

6) {m, n} m to the former matches a character n times

7) {m,} character before a match at least m times

8) {, n} former matches a character 0 to n times, most n

 

3. Boundary matchers 

1) ^ start of the string, if a plurality of lines of the beginning of each line match

2) [^] in [...], and ^ represents a negative, non-alphabetic [^ a-zA-Z], non-numeric [^ 0-9]

3) $ end of the string or line, if a multi-line pattern matching, the end of each line

4) \ A only matches the beginning of the string, with ^

5) \ b matches a word boundary, that is, refers to the location and the space between words

6) \ B is equivalent to [^ \ b] represents a non-word boundary matching

7) \ Z matches the end of the string, if the wrap is present, only the front end of the string to match the line feed.

8) \ z matching string ends

 

4. Logical matcher

1) | (or) 

    Matching | about any of the regular expression, if the expression on the left, the matching ending, no longer matches the regular expression to the right of the symbol in general () is used, if not then its range is in parentheses the entire regular expression

2) Packet (...) 

    After the reference, using () enclosed regular expressions will be used as a packet, counting from the left side of the expression sequentially n, the number of left parenthesis '(', there are that many packets, packets from a sequence encoding plus 1, both brackets nested parentheses, and the packet expression as a whole, can be accessed after the quantifiers.

3)  \<number> 

    String reference packet matches the packet number of <number> as: \ 1 ... \ 9

4)  (?P<name>...) 

    Named group, in addition to the default packet number packet re-assign aliases

    Note: P is capitalized

5)  (?P=name) 

    Alias ​​name matching reference packet, this is referenced in the regular expression, which matches string repetition, reference numbers may be used.

    Note: P is capitalized

 

5. Special matcher

? 1) (imx) regex includes three optional flags: i, m, or x. It affects only the area parentheses.

2) (? -Imx) regex off i, m, x or optional flag. It affects only the area parentheses.

3) (:? ...) within the parentheses to match the string does not as a group

4) (?! Pattern) before the negative assertion syntax, it represents the beginning of the negative

          Can only be used at the beginning of regular expressions, pattern matching mode, it does not need to match the contents of the back of the regular expression match is successful only

5) (? <! Pattern) after the negative assertion syntax, it represents the end of a negative

           Content in front of the pattern does not match the required pattern before a successful match

6) (? = Pattern) before the assertion syntax to be sure.

          Pattren need to match the pattern to match successfully, certainly indicates that the character content

7) (? <= Pattern) after the affirmative assertion syntax

         Pattern matching pattern matching need to succeed, certainly represent the characters behind the content

8) (? # ...) # behind the content will be treated as comments and ignored

Guess you like

Origin www.cnblogs.com/chengfengchi/p/12453847.html