[BOOK] regular expressions

Regular Expressions

1, open-source Chinese - Regular Expression Test Tool: https://tool.oschina.net/regex/

2, matching rules

 

 

 

3、 match()

From the starting position of the string matches the regular expression

If not match from a starting position or None

 

※ target matching: the regular expression added in () can be obtained according to part matching the position of the brackets

 

※ generic matches

  All the characters except for the early match newline
  * Unlimited match in front of character
  . * Matches any character

Re Import 

Content = '123 4567 World_lalalalalalal OOO gugu the Hello' 
# ^ the Hello: beginning Hello; \ s: Space; \ d: digital; \ w {5}: 5 characters or underlined 
result = re.match ( '^ Hello \ S (\ D \ D \ D) \ S \. 4 {D} \ S (\ {W}. 5) ', Content) 
Print (Result) 
# <re.match Object; span = (0, 20 is), match = '4567 123 the Hello World'> 
Print (result.group ()) 
# 4567 the Hello World 123, the matching result 
Print (result.span ()) 
# (0, 20 is), matching the range of 

## matches the target 
print (result. group (1)) # 123, the regular expression in brackets is the first portion of the 
print (result.group (2)) #World , the regular expression in brackets in the second portion 

## matches the general 
# matching first All newline characters other than 
# * character matches unlimited front 
#. * matches any character 
re1 = re.match ( '^ Hello. * gugu $', content) ## to match the entire character 
print (re1.group () )

  

※ greedy and non-greedy

        * Greedy matching, matching as many characters

  . *? Non-greedy matching, matching as few characters as

Using non-greedy match as much as possible to avoid missing matches

. *? Used at the end of the character may not match any content

Re Import 

Content = '1234567 World_lalalalalalal OOO gugu the Hello' 
Result = re.match ( 'of He ^. * (\ D +). * $ Gu', Content) 
Print (result.group (. 1)). 7 # 
## First matching greedy the last but one digit 7, and the rest to match. * inside 

result1 = re.match ( 'of He ^. *? (\ d +). * gugu $', Content) 
Print (result1.group (1)) # 1234567

  

※ modifier

 

 

※ escaped match

     When a group contains a string. * ^ Special characters need to match, in front of these special characters plus \

Re Import 

Content = '(Baidu) www.baidu.com' 
Result = re.match ( '\ (Baidu \) \ {W}. 3 \. \. 5} {W \ .. *', Content) 
Print (Result. when the group ()) ## match () to add the foregoing \

  

4、 search()

Scanning the entire string match, returning the first successful match

 

import re
content = 'Kollo 1234 mm lasokumawali 3434 yaya'
re1 = re.search('mm.*?ya', content)
print(re1.group())

  

5、 findall()

Scanning the entire string match, returns all results that match success

Returns a list type, for in loop iterates

 

6、 sub()

Remove some unrelated content, simplifying findall () regular expression

 

Re Import 

Content = 'be485a563u85ti544ful45545' 
Result = the re.sub ( '\ + D', '', Content) 
Print (Result) #beautiful, remove all of the numeric characters

  

7、 compile()

The regular expression string compiled regular expression object can be reused in subsequent matching

Re Import 

CON1 = '2019-12-06 12:12' 
CON2 = '2020-11-12 03:12' 
CON3 = '2022-03-22 19:45' 
pattern the re.compile = ( '\ D {2} : \ d {2} ') ## to compile a regular expression object 
# remove the time, date retention 
RES1 = the re.sub (pattern,' ', CON1) 
RES2 = the re.sub (pattern,' ', CON2) 
RES3 = Re .sub (pattern, '', CON3) 
Print (RES1, RES2, RES3) 
# 2019-12-06 2020-11-12 2022-03-22

  

 

Guess you like

Origin www.cnblogs.com/motoharu/p/12445185.html