Regular Expressions
1, open-source Chinese - Regular Expression Test Tool: https://tool.oschina.net/regex/
2, matching rules
3、 match()
From the starting position of the string matches the regular expression
If not match from a starting position or None
※ target matching: the regular expression added in () can be obtained according to part matching the position of the brackets
※ generic matches
All the characters except for the early match newline
* Unlimited match in front of character
. * Matches any character
Re Import Content = '123 4567 World_lalalalalalal OOO gugu the Hello' # ^ the Hello: beginning Hello; \ s: Space; \ d: digital; \ w {5}: 5 characters or underlined result = re.match ( '^ Hello \ S (\ D \ D \ D) \ S \. 4 {D} \ S (\ {W}. 5) ', Content) Print (Result) # <re.match Object; span = (0, 20 is), match = '4567 123 the Hello World'> Print (result.group ()) # 4567 the Hello World 123, the matching result Print (result.span ()) # (0, 20 is), matching the range of ## matches the target print (result. group (1)) # 123, the regular expression in brackets is the first portion of the print (result.group (2)) #World , the regular expression in brackets in the second portion ## matches the general # matching first All newline characters other than # * character matches unlimited front #. * matches any character re1 = re.match ( '^ Hello. * gugu $', content) ## to match the entire character print (re1.group () )
※ greedy and non-greedy
* Greedy matching, matching as many characters
. *? Non-greedy matching, matching as few characters as
Using non-greedy match as much as possible to avoid missing matches
. *? Used at the end of the character may not match any content
Re Import Content = '1234567 World_lalalalalalal OOO gugu the Hello' Result = re.match ( 'of He ^. * (\ D +). * $ Gu', Content) Print (result.group (. 1)). 7 # ## First matching greedy the last but one digit 7, and the rest to match. * inside result1 = re.match ( 'of He ^. *? (\ d +). * gugu $', Content) Print (result1.group (1)) # 1234567
※ modifier
※ escaped match
When a group contains a string. * ^ Special characters need to match, in front of these special characters plus \
Re Import Content = '(Baidu) www.baidu.com' Result = re.match ( '\ (Baidu \) \ {W}. 3 \. \. 5} {W \ .. *', Content) Print (Result. when the group ()) ## match () to add the foregoing \
4、 search()
Scanning the entire string match, returning the first successful match
import re content = 'Kollo 1234 mm lasokumawali 3434 yaya' re1 = re.search('mm.*?ya', content) print(re1.group())
5、 findall()
Scanning the entire string match, returns all results that match success
Returns a list type, for in loop iterates
6、 sub()
Remove some unrelated content, simplifying findall () regular expression
Re Import Content = 'be485a563u85ti544ful45545' Result = the re.sub ( '\ + D', '', Content) Print (Result) #beautiful, remove all of the numeric characters
7、 compile()
The regular expression string compiled regular expression object can be reused in subsequent matching
Re Import CON1 = '2019-12-06 12:12' CON2 = '2020-11-12 03:12' CON3 = '2022-03-22 19:45' pattern the re.compile = ( '\ D {2} : \ d {2} ') ## to compile a regular expression object # remove the time, date retention RES1 = the re.sub (pattern,' ', CON1) RES2 = the re.sub (pattern,' ', CON2) RES3 = Re .sub (pattern, '', CON3) Print (RES1, RES2, RES3) # 2019-12-06 2020-11-12 2022-03-22