re module

re module   

Common Regular Expression Symbols

'.' matches any character except \n by default. If flag DOTALL is specified, it matches any character, including newline
'^' matches the beginning of the character. If flags MULTILINE is specified, this can also be matched (r"^a","\nabc\neee",flags=re.MULTILINE)
'$' matches the end of the character, or e.search("foo$","bfoo\nsdfsf",flags=re.MULTILINE).group() also works
'*' matches the character before * 0 or more times, re.findall("ab*","cabb3abcbbac") results in ['abb', 'ab', 'a']
'+' matches the previous character 1 or more times, re.findall("ab+","ab+cd+abb+bba") results in ['ab', 'abb']
'?' matches the previous character 1 or 0 times
'{m}' matches the previous character m times
'{n,m}' matches the previous character n to m times, re.findall("ab{1,3}","abb abc abbcbbb") results in 'abb', 'ab', 'abb']
'|' matches |left or |right characters, re.search("abc|ABC","ABCBabCCD").group() results in 'ABC'
'(...)' group matching, re.search("(abc){2}a(123|456)c", "abcabca456c").group() result abcabca456c


'\A' only matches from the beginning of the character, re.search("\Aabc","alexabc") will not match
'\Z' matches end of character, same as $
'\d' matches numbers 0-9
'\D' matches non-digits
'\ w'Matching [A-Za-z0-9]
'\W' matches non-[A-Za-z0-9]
's' matches whitespace characters, \t, \n, \r , re.search("\s+","ab\tc1\n3").group() results in '\t'

'(?P<name>...)' 分组匹配 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","371481199306143242").groupdict("city") 结果{'province': '3714', 'city': '81', 'birthday': '1993'}

  

The most common matching syntax

re.match matches from the beginning
re.search matches containing
re.findall puts all the matched characters into the elements in the list and returns them
re.splitall uses the matched character as the list separator
re.sub matches characters and replaces

The trouble
with backslashes is the same as in most programming languages, using "\" as an escape character in regular expressions, which can cause trouble with backslashes. If you need to match the character "\" in the text, then 4 backslashes "\\\\" will be required in the regular expression expressed in the programming language: the first two and the last two are used in the programming language respectively Escape to a backslash, convert to two backslashes and then escape to a single backslash in the regular expression. The native string in Python solves this problem very well. The regular expression in this example can be represented by r"\\". Likewise, "\\d" that matches a digit can be written as r"\d". With native strings, you no longer have to worry about missing backslashes, and the expressions you write are more intuitive.

 

Just a few matching patterns to know

re.I(re.IGNORECASE): ignore case
M(MULTILINE): Multiline mode, changing the behavior of '^' and '$' (see above)
S(DOTALL): point any matching pattern, change the behavior of '.'

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324473154&siteId=291194637