import re re module reference
Seek
finall: Match all, each of which is an element in the list
search: only match the first left to right, not directly but rather a result variable, obtaining results group method is not matched to return None, use being given group
match: scratch match, the equivalent of a regular expression search plus a ^
Extended string processing: Cutting replacement
split cut
sub alternate format re.sub (old, new, string, Replace Occurrence)
subn returns a tuple, the second element is the number of replacements
Advanced re module: time / space
compile saves you use regular expressions to solve the problem of time
ret = re.compile ( '\ d + ') # has been completed compiled Print (RET) RES = ret.findall ( 'alex83taibai40egon25') Print (RES)
finditer saves you use regular expressions to solve the problem of space / memory
ret = re.finditer('\d+','alex83taibai40egon25') for i in ret: print(i.group())
Rearch () .group () parentheses in the figure it represents to the corresponding content packet
. 1 Import Re 2 S = ' <a> Wahaha </a> ' # markup language html page . 3 RET = the re.search ( ' <(\ W +)> (\ + W) </ (\ + W)> ' , S) . 4 Print (ret.group ()) # all results . 5 Print (ret.group (. 1)) # numeric parameter corresponding content represents the take packet
findall () has a special syntax, priority displays the contents of a regular expression () parentheses
Ungroup priority (:? Regular Expressions)
ret = re.findall('\d+(?:\.\d+)?','1.234*4')
print (right)
About group:
1, for regular expressions, sometimes we need to be grouped to constrain the number (\. [\ W] + ) of a character appearing?
2, for the python language, the packet can help you better and more accurate to find what you really need, for example, <(\ w +)> ( \ d +) <
split
1 K = re.split ( ' \ d + ' , ' alex83taibai40egon25 ' ) 2 the printer (right) 3 K = re.split ( ' (\ d +) ' , ' alex83taibai40egon25aa ' ) 4 printer (right)
python special agreement between the regular expressions
1, Group name (? P <name of the group> Regular Expressions)
2, must be consistent with the same name in front of the packet and the packet contents using the previous packet, this requires the use of a matching name
pattern = '<(?P<tab>\w+)>(\w+)</(?P=tab)>' ret = re.search(pattern,s) print(ret