Shanghai day15 - Regular Expressions and the re module

table of Contents

First, the regular expression

Two, re module

 

 

 

First, the regular expression

re module with regular expressions relationship:

  A regular expression is an independent technology, any language can be used, but if you want to use Python regular expressions, then we should help re module.

Role: regular expression is used to filter string specific content.

Scenario: reptiles, analyze the data.

Character Group: [set of characters]

Various characters in the same position that may occur to form a burst, by the regular expression [] FIG.

[0123456789] - with matching numbers and characters set in any of a number are all match the same success

[0-9] - [0123456789] simple wording, is used to match the figures, as long as there is a match with the matching is successful on the same digital

[Az] - Match all lowercase letters a to z

[AZ] - matching all capital letters

[0-9a-fA-F] - character set characters are different forms of "or" means, that the character set is used to match the characters 0-9 af AF

Summary: The only character set matches the character content on a location.

Yuan characters: only match the contents of a location

1, point - matches all characters except a newline.;

2、\w  \s  \d

  \ W - word matches any letters, numbers and underscores

  \ S - space matching whitespace

  \ D - digit numbers match

3、\W  \S  \D

  \ W - matched non-alphabetic, numeric, or underscore

  \ S - non-blank character match

  \ D - non-matching numbers

4、\n   \t

  \ N - newline

  \ T - match tab (Tab key is tab)

5、\b    ^     $

  \ B - matches the end of a word

  Matches the beginning of the string - ^

  The end of the match the string - $

6、[...]     [^....]

  Matches any character in the character set - [......]

  [^ ....] - matching all non-character character group

7、a | b   ()

  a | b - matches a or b

  () - grouping, the match expression in brackets, said packet.

Whether it is a character or group of characters can only be matched dollar character on a position, obviously inappropriate, so leads to "quantifier"!

Quantifier: used in the back metacharacters that increase the number of matches!

1、*   +   ?

  * - repeat 0 or more times

  + - repeated one or more times

  ? - Repeat 0 or 1

Graphic:

2、{n}   {n,}   {n,m}

  {N} - is repeated n times

  {N,} - repeated n times or more

  {N, m} - repeated n times to m

Regular Expressions:

 

 

Two, re module

The method commonly used in the re module: findAll () Search () match () ****

import re
"""
findall
search
match
"""

findall () usage:

# RES = the re.findall ( '[AZ] +', 'EVA Egon Jason') 
# # findAll ( 'regular expression' 'with matching string') 
# Print (RES) 
# find the string that match regular expressions entirety and returns a list of elements in the list is the result of regular match

search () usage:

# RES = the re.search ( 'A', 'EVA Egon Jason') 
# Print (RES) Search # you do not returned directly to the matching result, but to give you an object returns 
# Print (res.group ()) # must call the group to see the results matched 

"" " 
Note: 
    1.search only check once a regular basis as long as the results are found to not go down to look after 
    calling group direct result of the case 2. when looking for does not exist given 
"" " 
# RES1 the re.search = ( 'a', 'EVA Egon Jason') 
# # Search (" regular expression "," a string match with ') 
# IF RES1: 
#      Print (res1.group ( ))

match () usage:

# RES = re.match ( 'A', 'EVA Egon Jason') 
# Print (RES) 
# Print (res.group ()) 
"" " 
Note: 
    the beginning of the string only match 1.match 
    2. When returned the case at the beginning of the string does not meet the matching rules also call the group will be given None 
. "" "

re module other commonly used methods:

split() :

# RET = re.split ( '[ab &]', 'ABCD') press # 'a' obtained by dividing 'and' bcd ', of the' and 'BCD' respectively 'b' is divided 
# Print (RET ) # [ '', '' , 'cd'] returns a list or

sub():

# RET = the re.sub ( '\ D', 'H', 'eva3egon4yuan4', 1) the number # replaced 'H', the parameter 1 replaces only a 
# # Sub ( "regular expression, '' New content ',' character string to be replaced ', n) 
# "" " 
# to follow a regular expression search for all matches the expression uniform replaced' new content 'may also be controlled by replacing the number n 
# "" " 
# Print (RET) # evaHegon4yuan4

subn():

# RET = re.subn ( '\ D', 'H', 'eva3egon4yuan4') # digital replaced 'H', returns a tuple (a result of the replacement, the number of times the replacement) 
# RET1 = re.subn ( ' \ d ',' H ', ' eva3egon4yuan4 ', 1) # digital replaced by' H ', returns a tuple (result of the substitution, replacing the number of times) 
# Print (RET) # returns a tuple tuple the second element represents the number of replacement

compile():

# Obj = the re.compile ( '\ D {}. 3') # regular expression compiled into a regular expression object, the rule to be matched 3 digits 
# RET = obj.search ( 'abc123eeee') # regular expressions Object Search call, the parameters to be matched string 
# RES1 = obj.findall ( "347982734729349827384") 
# Print (ret.group ()) # results: 123 
# Print (RES1) # results: [ '347', ' 982 ',' 734 ',' 729 ',' 349 ',' 827 ',' 384 ']

split ();

# Import Re 
# RET = re.finditer ( '\ D', 'ds3sy4784a') returns a stored matching results #finditer iterator 
# Print (RET) # <Object callable_iterator AT 0x10195f940> 
# Print (Next (RET) .group ()) is equivalent to the # Next __ .__ ret () 
# Print (Next (ret) .group ()) is equivalent to the # Next __ .__ ret () 
# Print (Next (ret) .group ()) is equivalent to ret # Next __ .__ () 
# Print (Next (RET) .group ()) is equivalent to the # Next __ RET .__ () 
# Print (Next (RET) .group ()) is equivalent to the # Next __ .__ RET () 
# Print (Next (ret) .group ()) # is equivalent to ret .__ next __ () exceeds the range of values directly iterative error 
# Print (the Next (ret) .group ()) # View first result 
# Print (the Next (RET) .group ()) # View second result 
# print ([i.group () for i in ret]) # view about the remaining results

To the regular expression aliases: - added before the expression "P?"

Import Re
 # RES = the re.search ( '^ [1-9] (\ D {14}) (\ D {2} [0-9x])? $', '110105199812067023') 
# can also give a certain regular expression aliases 
# RES = the re.search ( '^ [1-9] (? P <password> \ D {14}) (? P <username> \ D {2} [0-9x])? $' , '110105199812067023') 
# Print (res.group ()) 
# Print (res.group ( 'password')) 
# Print (res.group (. 1)) 
# Print (res.group ( 'username')) 
# Print (res.group (2)) 
# Print (res.group (2)) 
# Print (res.group (. 1))

Regular expression to the packet priority mechanism is released

# RET1 = re.findall ( 'the WWW (baidu |. Oldboy) .com', 'www.oldboy.com') 
# RET2 = re.findall ( 'the WWW (?: baidu |. Oldboy) .com', 'the WWW .oldboy.com ') # ignore packet priority mechanism 
# Print (RET1, RET2) # [' Oldboy '] this is because findall will give priority to return the contents of the group matches, if you want to match the results, you can cancel the permission

split () method packet using regular expressions () different effects

# ret=re.split("\d+","eva3egon4yuan")
# print(ret) #结果 : ['eva', 'egon', 'yuan']
#
# ret1=re.split("(\d+)","eva3egon4yuan")
# print(ret1) #结果 : ['eva', '3', 'egon', '4', 'yuan']

 

 

 

 

Guess you like

Origin www.cnblogs.com/qinsungui921112/p/11202528.html