table of Contents
First, the regular expression
Two, re module
First, the regular expression
re module with regular expressions relationship:
A regular expression is an independent technology, any language can be used, but if you want to use Python regular expressions, then we should help re module.
Role: regular expression is used to filter string specific content.
Scenario: reptiles, analyze the data.
Character Group: [set of characters]
Various characters in the same position that may occur to form a burst, by the regular expression [] FIG.
[0123456789] - with matching numbers and characters set in any of a number are all match the same success
[0-9] - [0123456789] simple wording, is used to match the figures, as long as there is a match with the matching is successful on the same digital
[Az] - Match all lowercase letters a to z
[AZ] - matching all capital letters
[0-9a-fA-F] - character set characters are different forms of "or" means, that the character set is used to match the characters 0-9 af AF
Summary: The only character set matches the character content on a location.
Yuan characters: only match the contents of a location
1, point - matches all characters except a newline.;
2、\w \s \d
\ W - word matches any letters, numbers and underscores
\ S - space matching whitespace
\ D - digit numbers match
3、\W \S \D
\ W - matched non-alphabetic, numeric, or underscore
\ S - non-blank character match
\ D - non-matching numbers
4、\n \t
\ N - newline
\ T - match tab (Tab key is tab)
5、\b ^ $
\ B - matches the end of a word
Matches the beginning of the string - ^
The end of the match the string - $
6、[...] [^....]
Matches any character in the character set - [......]
[^ ....] - matching all non-character character group
7、a | b ()
a | b - matches a or b
() - grouping, the match expression in brackets, said packet.
Whether it is a character or group of characters can only be matched dollar character on a position, obviously inappropriate, so leads to "quantifier"!
Quantifier: used in the back metacharacters that increase the number of matches!
1、* + ?
* - repeat 0 or more times
+ - repeated one or more times
? - Repeat 0 or 1
Graphic:
2、{n} {n,} {n,m}
{N} - is repeated n times
{N,} - repeated n times or more
{N, m} - repeated n times to m
Regular Expressions:
Two, re module
The method commonly used in the re module: findAll () Search () match () ****
import re """ findall search match """
findall () usage:
# RES = the re.findall ( '[AZ] +', 'EVA Egon Jason') # # findAll ( 'regular expression' 'with matching string') # Print (RES) # find the string that match regular expressions entirety and returns a list of elements in the list is the result of regular match
search () usage:
# RES = the re.search ( 'A', 'EVA Egon Jason') # Print (RES) Search # you do not returned directly to the matching result, but to give you an object returns # Print (res.group ()) # must call the group to see the results matched "" " Note: 1.search only check once a regular basis as long as the results are found to not go down to look after calling group direct result of the case 2. when looking for does not exist given "" " # RES1 the re.search = ( 'a', 'EVA Egon Jason') # # Search (" regular expression "," a string match with ') # IF RES1: # Print (res1.group ( ))
match () usage:
# RES = re.match ( 'A', 'EVA Egon Jason') # Print (RES) # Print (res.group ()) "" " Note: the beginning of the string only match 1.match 2. When returned the case at the beginning of the string does not meet the matching rules also call the group will be given None . "" "
re module other commonly used methods:
split() :
# RET = re.split ( '[ab &]', 'ABCD') press # 'a' obtained by dividing 'and' bcd ', of the' and 'BCD' respectively 'b' is divided # Print (RET ) # [ '', '' , 'cd'] returns a list or
sub():
# RET = the re.sub ( '\ D', 'H', 'eva3egon4yuan4', 1) the number # replaced 'H', the parameter 1 replaces only a # # Sub ( "regular expression, '' New content ',' character string to be replaced ', n) # "" " # to follow a regular expression search for all matches the expression uniform replaced' new content 'may also be controlled by replacing the number n # "" " # Print (RET) # evaHegon4yuan4
subn():
# RET = re.subn ( '\ D', 'H', 'eva3egon4yuan4') # digital replaced 'H', returns a tuple (a result of the replacement, the number of times the replacement) # RET1 = re.subn ( ' \ d ',' H ', ' eva3egon4yuan4 ', 1) # digital replaced by' H ', returns a tuple (result of the substitution, replacing the number of times) # Print (RET) # returns a tuple tuple the second element represents the number of replacement
compile():
# Obj = the re.compile ( '\ D {}. 3') # regular expression compiled into a regular expression object, the rule to be matched 3 digits # RET = obj.search ( 'abc123eeee') # regular expressions Object Search call, the parameters to be matched string # RES1 = obj.findall ( "347982734729349827384") # Print (ret.group ()) # results: 123 # Print (RES1) # results: [ '347', ' 982 ',' 734 ',' 729 ',' 349 ',' 827 ',' 384 ']
split ();
# Import Re # RET = re.finditer ( '\ D', 'ds3sy4784a') returns a stored matching results #finditer iterator # Print (RET) # <Object callable_iterator AT 0x10195f940> # Print (Next (RET) .group ()) is equivalent to the # Next __ .__ ret () # Print (Next (ret) .group ()) is equivalent to the # Next __ .__ ret () # Print (Next (ret) .group ()) is equivalent to ret # Next __ .__ () # Print (Next (RET) .group ()) is equivalent to the # Next __ RET .__ () # Print (Next (RET) .group ()) is equivalent to the # Next __ .__ RET () # Print (Next (ret) .group ()) # is equivalent to ret .__ next __ () exceeds the range of values directly iterative error # Print (the Next (ret) .group ()) # View first result # Print (the Next (RET) .group ()) # View second result # print ([i.group () for i in ret]) # view about the remaining results
To the regular expression aliases: - added before the expression "P?"
Import Re # RES = the re.search ( '^ [1-9] (\ D {14}) (\ D {2} [0-9x])? $', '110105199812067023') # can also give a certain regular expression aliases # RES = the re.search ( '^ [1-9] (? P <password> \ D {14}) (? P <username> \ D {2} [0-9x])? $' , '110105199812067023') # Print (res.group ()) # Print (res.group ( 'password')) # Print (res.group (. 1)) # Print (res.group ( 'username')) # Print (res.group (2)) # Print (res.group (2)) # Print (res.group (. 1))
Regular expression to the packet priority mechanism is released
# RET1 = re.findall ( 'the WWW (baidu |. Oldboy) .com', 'www.oldboy.com') # RET2 = re.findall ( 'the WWW (?: baidu |. Oldboy) .com', 'the WWW .oldboy.com ') # ignore packet priority mechanism # Print (RET1, RET2) # [' Oldboy '] this is because findall will give priority to return the contents of the group matches, if you want to match the results, you can cancel the permission
split () method packet using regular expressions () different effects
# ret=re.split("\d+","eva3egon4yuan") # print(ret) #结果 : ['eva', 'egon', 'yuan'] # # ret1=re.split("(\d+)","eva3egon4yuan") # print(ret1) #结果 : ['eva', '3', 'egon', '4', 'yuan']