Python3 Quick Start (seven) - Python3 regular expression

1, re module Introduction

The re module provides Perl-style regular expression patterns, the Python language has all the regular expressions.

2, regular expression pattern

Pattern string using a special syntax to represent a regular expression:
regular expression pattern matching the same alphanumeric string; have different meanings before a backslash most letters and numerals; usually punctuation special meaning only when their match was an escape; a backslash itself needs to use the backslash escape.
The start of the string ^
$ matches the end of the string
. Matches any character except newline, when re.DOTALL flag is specified, will match any character comprises a newline.
[...] matches any character except newline, when re.DOTALL flag is specified, it will match any character comprises a newline.
[^ ...] matches any character except newline, when re.DOTALL flag is specified, will match any character comprises a newline.
re matches zero or more expressions.
re + 1 or more matching expressions.
Re? Match 0 or 1 by the preceding regular expression defined fragments, non-greedy manner
re {n} matches the n of the preceding expression. For example, "o {2}" does not match "Bob" in the "o", but can match the "food" in the two o.
re {n,} matches exactly the n preceding expression. For example, "o {2,}" does not match "Bob" in the "o", but it can match all o "foooood" in. "o {1,}" is equivalent to "o +". "o {0,}" is equivalent to "O ."
re {n,
m} m to n times matched by the preceding regular expression defined fragment, greedy manner a | b matches a or B
(Re) match expressions within the brackets, but also represents a group
(? IMX) comprises three regular expression optional flags: i, m, or x. It affects only the area parentheses.
(? -imx) closed the regular expression i, m, x, or optional flag. It affects only the area parentheses.
(imx:? re) use in brackets i, m, x, or optional flags
(-imx:? re) do not use i, m in parenthesis, or x optional flag
the comment (# ...?)
(? = re) before the delimiter to be sure. If the regular expression contained in the order ... said the success of the success of the current match position, otherwise fail. But once contained expressions have been tried, the matching engine does not advance; the remainder of the pattern is even attempt the right delimiter.
(?! Re) negative before the delimiter. And the opposite positive assertion; expression contained in the character string can not be successful when the current position match.
(?> Re) independent pattern matching, eliminating backtracking.
\ w matching alphanumeric underscore
\ W match nonalphanumeric underscore
\ s matches any whitespace character, equivalent to [\ t \ n \ r \ f].
\ S matches any non-null character
\ d match any number, is equivalent to [0-9].
\ D matches any non-numeric
\ A matches the beginning of the string
\ Z matches the end of the string, if the wrap is present, only the front end of the string to match the line feed.
\ z match string end
\ G match the last position alignment complete.
\ b matches a word boundary, that is, it refers to the location and spaces between words. For example, 'er \ b' matches "never" in the 'er', but does not match the "verb" in the 'er'.
\ B matches non-word boundary. 'er \ B' matches "verb" in the 'er', but does not match the "never" in the ' er '.
\ n, \ t, match a newline. A matching tab
\ 1 ... \ 9 matches the n-th packet.
\ 10 match the contents of the n-th packet, if it is matched. Otherwise, the expression refers to the octal character code.

3, regular expressions examples

python matching "python"
[Pp] ython matching "Python" or "python"
any letter in the parentheses [aeiou] Matching
[0-9] matches any digit
[az] matches any lowercase letters
[AZ] matches any uppercase
[a-zA-Z0-9] matches any letters and numerals
[^ aeiou] matches all characters except letters aeiou
[^ 0-9] matching characters in addition to numbers of

4, re common function module

the re.compile (pattern [, the flags])
the compile function to generate a regular expression object according to a pattern string and optional parameters flag, has a series of expression object is a method for replacement and regular expression matching.
pattern: a string of a regular expression
flags Alternatively, the matching mode represents the specific parameters:
re.I ignore case
re.L represent special characters \ w, \ W, \ b , \ B, \ s, \ S depends on the current environment
re.M multiline mode
re.S is the '' and includes any character, including newline ( '.' does not include a line feed)
re.U represent special characters \ w, \ W , \ b, \ B, \ d, \ D, \ s, \ S depends on the Unicode character properties database
re.X for readability, ignoring spaces, and '#' comment after
re.match (pattern, string, flags = 0)
try to match a pattern string from a starting position, returns the first matching is successful matching objects, otherwise None.
The regular expression pattern matching
string string to match.
flags flag for controlling the regular expression matching method, such as: whether or not case-sensitive, multi-line matching and the like.
You can use group (num) or groups () function to obtain the matching object matching expression. group () may be a plurality of input group number, in which case it will return those containing a group corresponding to the tuple values.
the re.search (pattern, String, the flags = 0)
the re.search first scans the entire string and returns a successful match.
Re.search method returns an object matching the success of a match, otherwise None.
You can use group (num) or groups () function to obtain the matching object matching expression.
the re.sub (pattern, the repl, String, COUNT = 0, the flags = 0)
the re.sub for matches in the replacement string.
pattern: in a regular pattern string.
repl: replace the string, it may be a function.
string: find the original string to be replaced.
count: Maximum number of replacements of the pattern matching, default 0 means to replace all occurrences.
flags: the pattern against the compile-time, in digital form
findall (string [, pos [, endpos]])
found regular expression matched all substrings in the string, and returns a list, if no match is found, then return empty list.
string string to be matched.
pos optional parameter specifies the starting position of the string, the default is 0.
endpos optional parameter specifying the end position of the string, the string length defaults.
re.finditer (pattern, string, flags = 0)
found positive in the string expression matched all substrings, and as a search result returned by the iterator.
re.split (pattern, string [, maxsplit = 0, flags = 0])
returns a list of the string split method Split string can be matched according to the sub
pattern matching a regular expression
character string to match string.
maxsplit partition number, maxsplit = 1 once separated, the default is 0, the number is not limited.
flags flag for controlling the regular expression matching method, such as: whether or not case-sensitive, multi-line matching, etc.

# -*- coding:utf-8 -*-
import re

# 将匹配的数字乘于 2
def double(matched):
    value = int(matched.group('value'))
    return str(value * 2)

if __name__ == '__main__':
    phone = "2004-959-559"
    # 返回第一个匹配的对象
    groups = re.match("\d+", phone)
    print(groups.group(0))
    # 返回第一个匹配的对象
    groups = re.search("\d+", phone)
    print(groups.group(0))
    # 返回匹配的所有对象的数组
    groups = re.findall("\d+", phone)
    print(groups)

    num = re.sub(r'\D', "", phone)
    print("Phone:", num)

    s = 'A23G4HFD567'
    print(re.sub('(?P<value>\d+)', double, s))

# output:
# 2004
# 2004
# ['2004', '959', '559']
# Phone: 2004959559
# A46G8HFD1134