python regular expression set - copied

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/wujing1_1/article/details/102777316

A. Regular expression matching a single character of

Format: LST = the re.findall ( regular expression , to match string )
predefined set of characters matches

. Matches any character except newline \ n

\ d match numbers

\ D matches non-digital

\ w match letters or numbers or an underscore (regular functions in support of matching Chinese)

\ W matches non letters or numbers or an underscore

\ s matches any whitespace

\ S matches any non-whitespace

\ n Matches a newline

\ t matches a tab

[] Brackets exemplified in the matching character

 

Character set Format Description [Default must be selected from a group of characters]

|---|---|

[...]  matches the character set of characters |

[^ ...] matches the characters in addition to all of the content within the group, all of the characters |

 

Character group content | characters to be matched | matched results | Description |

|---|---|---|---|

[0123456789]  8 True character set in the enumeration of various characters, must satisfy one, otherwise it returns false, does not match

[ABCDEFG]      9 False since the character set is not "9" character, it does not match

[0-9]  . 7 True available - indicates a range, [0-9] and [0123456789] is a mean

[AZ]  S True [AZ] matches all lowercase 

[AZ] B True  [AZ] says that all uppercase letters

[0-9a-fA-F.] E True numbers match, capitalization of af. Can verify the positive hexadecimal

 

The following example of the matching and for use with:

Re Import

LST = the re.findall ( regular expression to match string )

\ D  match numbers
\ D  matches non-digital

LST = the re.findall ( "\ D", '123qwe456asd')
Print (LST)
# results: [ '. 1', '2', '. 3', '. 4', '. 5', '. 6']
LST = the re.findall ( "\ D", "123qwe456asd ')
Print (LST)
# result: [' Q ',' W ',' E ',' a ',' S ',' D ']

\ W  letters , numbers , underscores , comprising Chinese ( regular function which supports matching the Chinese )
\ W is  matching non-alphabetic or numeric or decline line

LST = the re.findall ( '\ W', "D & * () qh321> Wi")
Print (LST)
# results:['d', 'q', 'h', '3', '2', '1', 'w', 'i']
lst = re.findall('\W',"d&*()qh321>wi")
# Results: [ '&', '*', '(', ')', '>']
Print (LST)

\ S  matches any whitespace
\ S  matches any non-blank character

strvar = "" "
" ""
= the re.findall LST ( '\ S', '')
Print (LST)
# result: [ '', '', '', '', '']
LST = the re.findall ( "\ S", strvar )
Print (LST)
# results: [] # strvar because there is no whitespace, no matches to the content, all empty list
LST = the re.findall ( "\ S", "DQHh DQ")
Print (LST)
# results is: [ 'D', 'Q ', 'H', 'h', 'd', 'q'] # is a list of all the characters removed whitespace formed

\ n  matching newline
\ t  matches a tab

strvar = "" "
The weather today is sunny                             
"" "
strvar2 =" ""
\ t \ tdqwdq
 WD \ t QQ
"" "
= the re.findall LST (R & lt "\ n-", strvar)
Print (LST)
# result: [ '\ n-', '\ n-']
Print ( "==========")
LST = the re.findall (R & lt "\ T", strvar2)
Print (LST)
# result: [ '\ T', '\ T', '\ T']

# ### character set of exercises to be selected from a group of characters , if a match is not successful
# example: [] brackets exemplified in the matching character
LST = the re.findall ( "[123]", "qwo1293dboh")
Print (LST)
# result: [ '1', '2', '3']

# intermediate [AG] represents a alphabet letter appears to match the criteria in respect g
Print (the re.findall ( "a [AG] B", "ACB the adb AAB ABB '))
# results: [ 'AAB', 'ABB', 'ACB', 'the adb']

# between a and b is a number from 0 to 9 on a compliance match
print (re.findall ( 'A [0123456789] B ',' A1B A2B ACB AYB a9090909009b '))
# optimized version: 0123456789 => 0-9 equivalents
Print (the re.findall ( 'a [0-9] b', 'A1B A2B ACB AYB a9090909009b'))
# result: [ 'A1B', 'A2B']

# a and b between a character that it is to occur between letters on a matching g of
Print (the re.findall ( 'a [ABCDEFG] B', 'A1B A2B ACB AYB the adb A3b'))
# optimized version: ABCD .... Z => AZ
Print (the re.findall ( 'A [AG] B', 'A1B A2B ACB AYB the adb A3b'))
# result: [ 'ACB', 'the adb']

Print ( "=========== ===== ")
# an intermediate that can be digits and uppercase and lowercase letters
print (re.findall ( 'a [0-9a -zA-Z] b', 'ab aab aAb aWb aqba1b a8d a6b aaa5b231 '))
# optimized version: a-zA-Z => Az matches all case, however defective, special symbols are also matched to
# results: [' aab ',' aAb ',' aWb ',' aqb ',' A1B ',' A6b ',' A5b ']

Print (the re.findall (' [Az] ',' ( '))
# Does not include letters (all not to match, an empty list
# results: []

Print (the re.findall ( '[0-Z]', '. 9')) # allowed syntax, but do not use it ,pointless

#a and b need to match two intermediate 0-9 is a character, the other is * # / wherein is selected from a
print (re.findall ( 'a [0-9 ] [* # /] b', 'a1 / b a29b a56b a456bab A2B '))
# result: [' A1 / B ']

# ^ ^ character group in addition represent mean
print (re.findall (' a [^ - + * /] b ', "a% * BD BDA CCAA & B "))
# result: [ 'a% b', 'a & b']

II. A plurality of matching characters => [Metacharacter] symbols quantifiers

quantifier

Usage notes :

? Repeat 0 or 1 times

+ Repeated one or more times (at least once)

* Repeat zero or more times (many times)

{n} n times

{n,} n times or more times (n times at least)

{n, m} is repeated n times to m

* + Greedy pattern matching

. *? +? Non-greedy pattern matches

 

# Greedy match: as many backward match the underlying use of backtracking algorithm

# Non-greedy match: match back as little as possible

    (1) quantifier plus a question mark? It represents a non-matching inert greedy

(2). *? W which matches any character of any length encountered a w immediately stop

 

Re Import
# (. 1)?  Match 0 one or a  . 1  th  A
Print (the re.findall ( 'A? B', 'abbzab AAB aaxqab ABB'))
# [ 'ab &', 'B', 'ab &', 'ab & ',' B ',' ab & ',' ab & ']
# (2) +  match . 1 or a plurality of a # must have a, not a is not consistent with
print (re.findall (' a + b ', 'abbzab AAB aaxqab ABB'))
# [ 'ab &', 'ab &', 'ab &', 'AAB', 'ab &']
# (. 3) *  match 0 or a plurality of A
Print (the re.findall ( 'A B * ',' abbzab AAB aaxqab ABB '))
# can not, there may also be a plurality of
# [' ab ',' b ','ab & ',' ab & ',' B ',' AAB ',' ab & ']
# (. 4) {m, n}  matches the m th through nA a
Print (the re.findall ( 'l, 3} a {B', 'abbzab ABB AAB aaxqab aaaaaab'))
# only one or more before a match to 3 where a is present
# [ 'ab', ' ab & ',' ab & ',' AAB ',' ab & ',' AAAB ']

Print (the re.findall (' {A}. 1 B ',' abbzab ABB AAB aaxqab aaaaaaaaaab '))
# [' ab & ',' ab & ',' ab ',' ab ',' ab '] matches only one of a

Print (the re.findall (' a {. 1, B} ',' abbzab ABB AAB aaxqab aaaaaaaaaab '))
# [' ab & ',' ab ',' ab ',' aab ',' ab ',' aaaaaaaaaab '] , only one a or more present on the matching

greedy match non-greedy match  [ syntax : back quantifier plus ? number ]
greedy match default to more the number of matches ,Using a backtracking algorithm bottom ;
non-greedy matching default match the less often
    behind a quantifier plus ? Number ,Non-greedy it is , for example :.?..?. *  ?? + {m, n} *?.? With more
    if they are sub-word , in the non-greedy , matched to a first return
backtracking :
from left to right match , has been looking back , until it could not be found , go back , take that value closest to the right side
# common example:
strvar = "can and Liu Liu Liu hammer and stick 12313 child"
lst = re.findall ( "Liu.", strvar)
Print (LST)
# [ 'Liu can', 'Tie,' 'Liu']
# 1. greedy match
lst = re.findall ( "Liu.?" , strvar) #.? matches one character repeated 0 or 1
Print (LST)
# [ 'Liu can', 'Tie,' 'Liu']

LST = re.findall ( "Liu. +", strvar ) # + repeated one or more times,All because of greed on infinite repeat matches a character
Print (LST)
# [ 'energy and Liu Liu Liu stick hammer and 12313 sub']

lst = re.findall ( "Liu. *", strvar) # repeated zero or more times, is repeated so as to match a character greedy
Print (LST)
# [ 'can Liu and Liu and Liu stick hammer sub 12313']

LST = re.findall ( "Liu * son", strvar)
Print (LST)
# [ 'can Liu and Liu and Liu stick hammer sub 12313']

# strvar1 = "sub Liu 123456789123456789121"
LST = re.findall ( "Liu. {1,20} child ", strvar) # Liu and sub-intermediate can be matched up to 20 characters, if the top row of empty list is output strvar1
Print (LST," <==> ")
# [ 'Liu and Liu hammer can and Liu stick sub 12313 '] <==>

non-greedy matching
strvar = "Liu and Liu hammer and can stick Liu sub 12313"
LST = the re.findall ( "Liu. ??", strvar) # because of non-greedy, All? repeat 0 or 1 times becomes match 0 on line with
Print (LST)
# [ 'Liu', 'Liu', 'Liu']

LST = re.findall ( "Liu. +?", strvar) # + because the non-greedy repeated one or more times,That is in line with repeated once
Print (LST)
# [ 'Liu can', 'Tie,' 'Liu']

lst = re.findall ( "Liu. *?", strvar) # non-greedy because all * Repeat zero or more times becomes Match 0 on line with
Print (LST)
# [ 'Liu', 'Liu' 'Liu']

# matching to the first child to return directly
lst = re.findall ( "Liu. *? child", strvar) # Liu from the beginning to the end of the first child to
Print (LST)
# [ 'Liu and can hammer Liu and Liu big stick ']

LST = re.findall ( "Liu. {1,20}? child", strvar)
# original match to a greater, but non-greedy selection in line with a short match on output, up to 20 times the intermediate match any single character, does not comply with the above 20
Print (LST)
# [ 'can Liu and Liu and Liu hammer stick']
 

III. Matches the end of the beginning   => [ Metacharacter ]   symbol boundaries

 

1.\b

\ b  to match the boundary
\ b  backspace  backspace  is an escape character
generally written regular expression when , in front of a string of plus r, so that failure of the escape character

# word match for the right boundary d d \ b matches w for the left boundary \ bw greedy match
lst = re.findall (r ". * d \ b", "word pwd abc") # front d matching one or more times, greedy matching algorithm continues to comply with the matching times in a timely manner get to know does not meet the accord's longest
Print (LST)
# [ 'Word pwd']

# non-greedy match
lst = re.findall (r ". * ? d \ b", "word pwd abc") ## d the previous match one or more times, non-greedy matching algorithm several times to get it in line with the shortest
Print (LST)
# [ 'Word', 'pwd'] # Note there is a space before pwd, is matched to the character

# optimized version : give it a space \ S matches any non-blank character, the following will remove the spaces between words
LST = the re.findall (R & lt "\ S * D \ B?", "Word pwd ABC")
Print (LST)
# [ 'word', 'pwd']

# Matching words left border
lst = re.findall (r "\ bw . *", "word abc") # * in a trailing space, the space is to identify matches, the matches to all the 'word' side there is a greedy algorithm
print (lst)
# [ 'word']

2. must begin with a character , it does not matter the following character

 

3. must end with a character , the previous character does not matter

If the regular which contains a ^ or means putting the string as a whole

strvar = "brother sister uncle"
Print (re.findall ( 'large.' Strvar))
# [ 'Big Brother', 'sister', ' Grandpa ']

Print (re.findall (' ^ big. ', strvar))
# [' big brother ']

Print (re.findall (' large. $ ', strvar))
# [' uncle ']

Print (re.findall ( '^ large. $', strvar)) # no match to comply with all outputs an empty list
# []

Print (re.findall ( '^ large. *? $', strvar) ) # string as a whole, to match to the end of all the characters
# [ 'brother sister uncle']


Print (re.findall ( '^ large. *? big $', strvar))
# [] # because there is a string beginning with a large, but not large at the end, did not meet all of the match, returns an empty list

print (re.findall ( '^ large. *? Ye $', strvar))
# [ 'brother sister uncle']

# the string as a whole, as long as a result
print (re.findall ( '^g.*? ', 'giveme 1gfive gay '))
# ['giveme ']

Print (the re.findall ( 'Five $', 'aassfive'))
# [ 'Five']

Print (the re.findall ( 'Five $', 'aassfive00'))
# [] # e does not end with the

print (re .findall ( '^ giveme $', 'giveme'))
# [ 'giveme']

Print (the re.findall ( '^ giveme $', 'givemeq'))
# [] # g does not meet the beginning to the end of the e

print (re.findall ( '^ giv.me $', 'giveme'))
# [ 'giveme'] # intermediate that may be, beginning and ending with e g plus character matches any character


print (re.findall ( '^ giveme $', 'giveme giveme'))
# [] # is no matching

print (re.findall ( 'giveme', 'giveme giveme')) # meet two
# [ 'giveme', 'giveme '] # what no beginning or end to what, as long as you can meet the characters conform to

print (re.findall ( "^ g. * e", 'giveme 1gfive gay'))
# [ 'Giveme 1gfive'] # Note greedy algorithm

print (re.findall ( "^ g. *? E", 'giveme 1gfive gay'))
# [ 'Give'] # non-greedy algorithm, string matching the shortest encountered on access

Guess you like

Origin blog.csdn.net/wujing1_1/article/details/102777316