A. Regular expression matching a single character of
Format: LST = the re.findall ( regular expression , to match string )
predefined set of characters matches
. Matches any character except newline \ n
\ d match numbers
\ D matches non-digital
\ w match letters or numbers or an underscore (regular functions in support of matching Chinese)
\ W matches non letters or numbers or an underscore
\ s matches any whitespace
\ S matches any non-whitespace
\ n Matches a newline
\ t matches a tab
[] Brackets exemplified in the matching character
Character set Format Description [Default must be selected from a group of characters]
|---|---|
[...] matches the character set of characters |
[^ ...] matches the characters in addition to all of the content within the group, all of the characters |
Character group content | characters to be matched | matched results | Description |
|---|---|---|---|
[0123456789] 8 True character set in the enumeration of various characters, must satisfy one, otherwise it returns false, does not match
[ABCDEFG] 9 False since the character set is not "9" character, it does not match
[0-9] . 7 True available - indicates a range, [0-9] and [0123456789] is a mean
[AZ] S True [AZ] matches all lowercase
[AZ] B True [AZ] says that all uppercase letters
[0-9a-fA-F.] E True numbers match, capitalization of af. Can verify the positive hexadecimal
The following example of the matching and for use with:
Re Import
LST = the re.findall ( regular expression , to match string )
\ D match numbers
\ D matches non-digital
LST = the re.findall ( "\ D", '123qwe456asd')
Print (LST)
# results: [ '. 1', '2', '. 3', '. 4', '. 5', '. 6']
LST = the re.findall ( "\ D", "123qwe456asd ')
Print (LST)
# result: [' Q ',' W ',' E ',' a ',' S ',' D ']
\ W letters , numbers , underscores , comprising Chinese ( regular function which supports matching the Chinese )
\ W is matching non-alphabetic or numeric or decline line
LST = the re.findall ( '\ W', "D & * () qh321> Wi")
Print (LST)
# results:['d', 'q', 'h', '3', '2', '1', 'w', 'i']
lst = re.findall('\W',"d&*()qh321>wi")
# Results: [ '&', '*', '(', ')', '>']
Print (LST)
\ S matches any whitespace
\ S matches any non-blank character
strvar = "" "
" ""
= the re.findall LST ( '\ S', '')
Print (LST)
# result: [ '', '', '', '', '']
LST = the re.findall ( "\ S", strvar )
Print (LST)
# results: [] # strvar because there is no whitespace, no matches to the content, all empty list
LST = the re.findall ( "\ S", "DQHh DQ")
Print (LST)
# results is: [ 'D', 'Q ', 'H', 'h', 'd', 'q'] # is a list of all the characters removed whitespace formed
\ n matching newline
\ t matches a tab
strvar = "" "
The weather today is sunny
"" "
strvar2 =" ""
\ t \ tdqwdq
WD \ t QQ
"" "
= the re.findall LST (R & lt "\ n-", strvar)
Print (LST)
# result: [ '\ n-', '\ n-']
Print ( "==========")
LST = the re.findall (R & lt "\ T", strvar2)
Print (LST)
# result: [ '\ T', '\ T', '\ T']
# ### character set of exercises , to be selected from a group of characters , if a match is not successful
# example: [] brackets exemplified in the matching character
LST = the re.findall ( "[123]", "qwo1293dboh")
Print (LST)
# result: [ '1', '2', '3']
# intermediate [AG] represents a alphabet letter appears to match the criteria in respect g
Print (the re.findall ( "a [AG] B", "ACB the adb AAB ABB '))
# results: [ 'AAB', 'ABB', 'ACB', 'the adb']
# between a and b is a number from 0 to 9 on a compliance match
print (re.findall ( 'A [0123456789] B ',' A1B A2B ACB AYB a9090909009b '))
# optimized version: 0123456789 => 0-9 equivalents
Print (the re.findall ( 'a [0-9] b', 'A1B A2B ACB AYB a9090909009b'))
# result: [ 'A1B', 'A2B']
# a and b between a character that it is to occur between letters on a matching g of
Print (the re.findall ( 'a [ABCDEFG] B', 'A1B A2B ACB AYB the adb A3b'))
# optimized version: ABCD .... Z => AZ
Print (the re.findall ( 'A [AG] B', 'A1B A2B ACB AYB the adb A3b'))
# result: [ 'ACB', 'the adb']
Print ( "=========== ===== ")
# an intermediate that can be digits and uppercase and lowercase letters
print (re.findall ( 'a [0-9a -zA-Z] b', 'ab aab aAb aWb aqba1b a8d a6b aaa5b231 '))
# optimized version: a-zA-Z => Az matches all case, however defective, special symbols are also matched to
# results: [' aab ',' aAb ',' aWb ',' aqb ',' A1B ',' A6b ',' A5b ']
Print (the re.findall (' [Az] ',' ( '))
# Does not include letters (all not to match, an empty list
# results: []
Print (the re.findall ( '[0-Z]', '. 9')) # allowed syntax, but do not use it ,pointless
#a and b need to match two intermediate 0-9 is a character, the other is * # / wherein is selected from a
print (re.findall ( 'a [0-9 ] [* # /] b', 'a1 / b a29b a56b a456bab A2B '))
# result: [' A1 / B ']
# ^ ^ character group in addition represent mean
print (re.findall (' a [^ - + * /] b ', "a% * BD BDA CCAA & B "))
# result: [ 'a% b', 'a & b']
II. A plurality of matching characters => [Metacharacter] symbols quantifiers
quantifier
Usage notes :
? Repeat 0 or 1 times
+ Repeated one or more times (at least once)
* Repeat zero or more times (many times)
{n} n times
{n,} n times or more times (n times at least)
{n, m} is repeated n times to m
* + Greedy pattern matching
. *? +? Non-greedy pattern matches
# Greedy match: as many backward match the underlying use of backtracking algorithm
# Non-greedy match: match back as little as possible
(1) quantifier plus a question mark? It represents a non-matching inert greedy
(2). *? W which matches any character of any length encountered a w immediately stop
Re Import
# (. 1)? Match 0 one or a . 1 th A
Print (the re.findall ( 'A? B', 'abbzab AAB aaxqab ABB'))
# [ 'ab &', 'B', 'ab &', 'ab & ',' B ',' ab & ',' ab & ']
# (2) + match . 1 or a plurality of a # must have a, not a is not consistent with
print (re.findall (' a + b ', 'abbzab AAB aaxqab ABB'))
# [ 'ab &', 'ab &', 'ab &', 'AAB', 'ab &']
# (. 3) * match 0 or a plurality of A
Print (the re.findall ( 'A B * ',' abbzab AAB aaxqab ABB '))
# can not, there may also be a plurality of
# [' ab ',' b ','ab & ',' ab & ',' B ',' AAB ',' ab & ']
# (. 4) {m, n} matches the m th through nA a
Print (the re.findall ( 'l, 3} a {B', 'abbzab ABB AAB aaxqab aaaaaab'))
# only one or more before a match to 3 where a is present
# [ 'ab', ' ab & ',' ab & ',' AAB ',' ab & ',' AAAB ']
Print (the re.findall (' {A}. 1 B ',' abbzab ABB AAB aaxqab aaaaaaaaaab '))
# [' ab & ',' ab & ',' ab ',' ab ',' ab '] matches only one of a
Print (the re.findall (' a {. 1, B} ',' abbzab ABB AAB aaxqab aaaaaaaaaab '))
# [' ab & ',' ab ',' ab ',' aab ',' ab ',' aaaaaaaaaab '] , only one a or more present on the matching
# greedy match non-greedy match [ syntax : back quantifier plus ? number ]
greedy match : default to more the number of matches ,Using a backtracking algorithm bottom ;
non-greedy matching : default match the less often
behind a quantifier plus ? Number ,Non-greedy it is , for example :.?..?. * ?? + {m, n} *?.? With more
if they are sub-word , in the non-greedy , matched to a first return
backtracking :
from left to right match , has been looking back , until it could not be found , go back , take that value closest to the right side
# common example:
strvar = "can and Liu Liu Liu hammer and stick 12313 child"
lst = re.findall ( "Liu.", strvar)
Print (LST)
# [ 'Liu can', 'Tie,' 'Liu']
# 1. greedy match
lst = re.findall ( "Liu.?" , strvar) #.? matches one character repeated 0 or 1
Print (LST)
# [ 'Liu can', 'Tie,' 'Liu']
LST = re.findall ( "Liu. +", strvar ) # + repeated one or more times,All because of greed on infinite repeat matches a character
Print (LST)
# [ 'energy and Liu Liu Liu stick hammer and 12313 sub']
lst = re.findall ( "Liu. *", strvar) # repeated zero or more times, is repeated so as to match a character greedy
Print (LST)
# [ 'can Liu and Liu and Liu stick hammer sub 12313']
LST = re.findall ( "Liu * son", strvar)
Print (LST)
# [ 'can Liu and Liu and Liu stick hammer sub 12313']
# strvar1 = "sub Liu 123456789123456789121"
LST = re.findall ( "Liu. {1,20} child ", strvar) # Liu and sub-intermediate can be matched up to 20 characters, if the top row of empty list is output strvar1
Print (LST," <==> ")
# [ 'Liu and Liu hammer can and Liu stick sub 12313 '] <==>
# non-greedy matching
strvar = "Liu and Liu hammer and can stick Liu sub 12313"
LST = the re.findall ( "Liu. ??", strvar) # because of non-greedy, All? repeat 0 or 1 times becomes match 0 on line with
Print (LST)
# [ 'Liu', 'Liu', 'Liu']
LST = re.findall ( "Liu. +?", strvar) # + because the non-greedy repeated one or more times,That is in line with repeated once
Print (LST)
# [ 'Liu can', 'Tie,' 'Liu']
lst = re.findall ( "Liu. *?", strvar) # non-greedy because all * Repeat zero or more times becomes Match 0 on line with
Print (LST)
# [ 'Liu', 'Liu' 'Liu']
# matching to the first child to return directly
lst = re.findall ( "Liu. *? child", strvar) # Liu from the beginning to the end of the first child to
Print (LST)
# [ 'Liu and can hammer Liu and Liu big stick ']
LST = re.findall ( "Liu. {1,20}? child", strvar)
# original match to a greater, but non-greedy selection in line with a short match on output, up to 20 times the intermediate match any single character, does not comply with the above 20
Print (LST)
# [ 'can Liu and Liu and Liu hammer stick']
III. Matches the end of the beginning => [ Metacharacter ] symbol boundaries
1.\b
\ b to match the boundary
\ b backspace backspace is an escape character
generally written regular expression when , in front of a string of plus r, so that failure of the escape character
# word match for the right boundary d d \ b matches w for the left boundary \ bw greedy match
lst = re.findall (r ". * d \ b", "word pwd abc") # front d matching one or more times, greedy matching algorithm continues to comply with the matching times in a timely manner get to know does not meet the accord's longest
Print (LST)
# [ 'Word pwd']
# non-greedy match
lst = re.findall (r ". * ? d \ b", "word pwd abc") ## d the previous match one or more times, non-greedy matching algorithm several times to get it in line with the shortest
Print (LST)
# [ 'Word', 'pwd'] # Note there is a space before pwd, is matched to the character
# optimized version : give it a space \ S matches any non-blank character, the following will remove the spaces between words
LST = the re.findall (R & lt "\ S * D \ B?", "Word pwd ABC")
Print (LST)
# [ 'word', 'pwd']
# Matching words left border
lst = re.findall (r "\ bw . *", "word abc") # * in a trailing space, the space is to identify matches, the matches to all the 'word' side there is a greedy algorithm
print (lst)
# [ 'word']
2. ^ must begin with a character , it does not matter the following character
3. $ must end with a character , the previous character does not matter
If the regular which contains a ^ or $ means putting the string as a whole
strvar = "brother sister uncle"
Print (re.findall ( 'large.' Strvar))
# [ 'Big Brother', 'sister', ' Grandpa ']
Print (re.findall (' ^ big. ', strvar))
# [' big brother ']
Print (re.findall (' large. $ ', strvar))
# [' uncle ']
Print (re.findall ( '^ large. $', strvar)) # no match to comply with all outputs an empty list
# []
Print (re.findall ( '^ large. *? $', strvar) ) # string as a whole, to match to the end of all the characters
# [ 'brother sister uncle']
Print (re.findall ( '^ large. *? big $', strvar))
# [] # because there is a string beginning with a large, but not large at the end, did not meet all of the match, returns an empty list
print (re.findall ( '^ large. *? Ye $', strvar))
# [ 'brother sister uncle']
# the string as a whole, as long as a result
print (re.findall ( '^g.*? ', 'giveme 1gfive gay '))
# ['giveme ']
Print (the re.findall ( 'Five $', 'aassfive'))
# [ 'Five']
Print (the re.findall ( 'Five $', 'aassfive00'))
# [] # e does not end with the
print (re .findall ( '^ giveme $', 'giveme'))
# [ 'giveme']
Print (the re.findall ( '^ giveme $', 'givemeq'))
# [] # g does not meet the beginning to the end of the e
print (re.findall ( '^ giv.me $', 'giveme'))
# [ 'giveme'] # intermediate that may be, beginning and ending with e g plus character matches any character
print (re.findall ( '^ giveme $', 'giveme giveme'))
# [] # is no matching
print (re.findall ( 'giveme', 'giveme giveme')) # meet two
# [ 'giveme', 'giveme '] # what no beginning or end to what, as long as you can meet the characters conform to
print (re.findall ( "^ g. * e", 'giveme 1gfive gay'))
# [ 'Giveme 1gfive'] # Note greedy algorithm
print (re.findall ( "^ g. *? E", 'giveme 1gfive gay'))
# [ 'Give'] # non-greedy algorithm, string matching the shortest encountered on access