Regular re matching module _python

A, re module

1, the module functions

By re access interface module regular expression language, mainly for matching string.

2, regular expressions and the meaning of metacharacters

The representative of any character (except newline \ n)

What to beginning ^

$ To what end

* Repeat * matches the preceding character to appear repeatedly 0 [0, + infinity]

+ Repeat + matches the preceding character 1 to many times [1, positive infinity]

? Repeat match? Preceding character 0 or 1 [0,1]

{} Foregoing figures represent the number of matches, such as 'b {3}'

[] Character represents the character set, or relationship, such as '[az]', as well as cancel metacharacter meaning of special features,

  The '[^ 123]' ^ in [] in front of representative negated.

  The [1-5], - in [] which represent a range of

\ And ordinary characters, represents a certain significance as [\ D], particularly on behalf of the following meanings; ( but \ character with a special self-cancel specificity, such as [\ ^])

  \ d matches all numbers, corresponding to [0-9]
  \ D matches non-numeric characters, corresponding to [^ 0-9]
  \ W matches alphanumeric underscore, equivalent [0-9a-zA-Z_]
  \ W is matched nonalphanumeric underlined, equivalent [^ 0-9a-zA-Z_]
  \ S matches any whitespace (space, line feed, carriage return, tab pages) equivalent to [\ F \ n-\ R & lt \ T]
  \ S match any non-blank character, equivalent to [^ \ F \ n-\ R & lt \ T]
  \ A matches the beginning of the string, and the difference between ^: \ A match only the first row, he does not match the first trekking at re.M
  \ Z matches the end of the string, and $ difference: \ Z matches only at the end, in re.M not match the end of his line
  \ b matches a word boundary between, spaces
  \ B match between non-word boundary, space

 

() Do the packet, make it match the overall character set, such as '(bs)'

  Add groups Group name: Group name according to find out

  

 Find URL example:

Re Import 
Print (the re.findall ( 'WWW. (\ W +). COM', "www.baidu.com")) # [ 'baidu'], intermediate results obtained
print (re.findall ( 'www. ( ? :. \ w +) com ' , "www.baidu.com")) # [' www.baidu.com '], all the results obtained

3. The method of module

findall (): all results are returned to a list

search (): returns the first object (object) to match, can call group () method returns the result

    print(re.search('www.(\w+).com',"www.baidu.com").group())

match (): only the beginning of a match string matches only at the beginning of character does not meet. Also returns an object, but also with a group () returns the result.

split (): split string

    print(re.split('k','sdfkwerkryy')) #['sdf', 'wer', 'ryy']

sub ( "replacing the former", "Override", "Replace a string", replace the number (do not write the default Replace All)) 

    print (re.sub ( 'chen', 'peng', 'chenxiaozanchen', 1)) #pengxiaozanchen

compile (): raise a little efficiency, compiled the rules, then call

    

 

 finditer (): the result is not the place list, but iterators

    

 

 

 

 

.

Guess you like

Origin www.cnblogs.com/chenxiaozan/p/12164382.html