A, re module
1, the module functions
By re access interface module regular expression language, mainly for matching string.
2, regular expressions and the meaning of metacharacters
The representative of any character (except newline \ n)
What to beginning ^
$ To what end
* Repeat * matches the preceding character to appear repeatedly 0 [0, + infinity]
+ Repeat + matches the preceding character 1 to many times [1, positive infinity]
? Repeat match? Preceding character 0 or 1 [0,1]
{} Foregoing figures represent the number of matches, such as 'b {3}'
[] Character represents the character set, or relationship, such as '[az]', as well as cancel metacharacter meaning of special features,
The '[^ 123]' ^ in [] in front of representative negated.
The [1-5], - in [] which represent a range of
\ And ordinary characters, represents a certain significance as [\ D], particularly on behalf of the following meanings; ( but \ character with a special self-cancel specificity, such as [\ ^])
\ d matches all numbers, corresponding to [0-9]
\ D matches non-numeric characters, corresponding to [^ 0-9]
\ W matches alphanumeric underscore, equivalent [0-9a-zA-Z_]
\ W is matched nonalphanumeric underlined, equivalent [^ 0-9a-zA-Z_]
\ S matches any whitespace (space, line feed, carriage return, tab pages) equivalent to [\ F \ n-\ R & lt \ T]
\ S match any non-blank character, equivalent to [^ \ F \ n-\ R & lt \ T]
\ A matches the beginning of the string, and the difference between ^: \ A match only the first row, he does not match the first trekking at re.M
\ Z matches the end of the string, and $ difference: \ Z matches only at the end, in re.M not match the end of his line
\ b matches a word boundary between, spaces
\ B match between non-word boundary, space
() Do the packet, make it match the overall character set, such as '(bs)'
Add groups Group name: Group name according to find out
Find URL example:
Re Import
Print (the re.findall ( 'WWW. (\ W +). COM', "www.baidu.com")) # [ 'baidu'], intermediate results obtained
print (re.findall ( 'www. ( ? :. \ w +) com ' , "www.baidu.com")) # [' www.baidu.com '], all the results obtained
3. The method of module
findall (): all results are returned to a list
search (): returns the first object (object) to match, can call group () method returns the result
print(re.search('www.(\w+).com',"www.baidu.com").group())
match (): only the beginning of a match string matches only at the beginning of character does not meet. Also returns an object, but also with a group () returns the result.
split (): split string
print(re.split('k','sdfkwerkryy')) #['sdf', 'wer', 'ryy']
sub ( "replacing the former", "Override", "Replace a string", replace the number (do not write the default Replace All))
print (re.sub ( 'chen', 'peng', 'chenxiaozanchen', 1)) #pengxiaozanchen
compile (): raise a little efficiency, compiled the rules, then call
finditer (): the result is not the place list, but iterators
.