41, python based learning module -re

By their very nature, the regular expression (or RE) is a small, highly specialized programming language (in Python) it is embedded in Python, and implemented by the re module. Regular expression pattern is compiled into a series of byte code, written in C and then executed by the matching engine.

Character matches (ordinary characters, meta characters):

Ordinary characters 1: and most of the characters are letters and match themselves
              >>> the re.findall ( 'Alvin', 'yuanaleSxalexwupeiqi')
                      [ 'Alvin'] 

2 yuan characters:.? ^ $ * + {} [] | () \

Yuan characters. ^ $ * +? {}

import re
 
ret=re.findall('a..in','helloalvin')
print(ret)#['alvin']
 
 
ret=re.findall('^a...n','alvinhelloawwwn')
print(ret)#['alvin']
 
 
ret=re.findall('a...n$','alvinhelloawwwn')
print(ret)#['awwwn']
 
 
ret=re.findall('a...n$','alvinhelloawwwn')
print(ret)#['awwwn']
 
 
ret=re.findall('abc*','abcccc')#贪婪匹配[0,+oo]  
print(ret)#['abcccc']
 
ret=re.findall('abc+','abccc')#[1,+oo]
print(ret)#['abccc']
 
ret=re.findall('abc?','abccc')#[0,1]
print(ret)#['abc']
 
 
ret=re.findall('abc{1,4}','abccc') 
Print (RET) # [ 'abccc'] matches greedy

  Note: The previous *, + ,? are all greedy matching, matching is possible, followed by a plus sign so that it becomes inert match?

ret=re.findall('abc*?','abcccccc')
print(ret)#['ab']

Yuan character of the character set []:

# -------------------------------------------- character set [] 
RET the re.findall = ( 'A [BC] D', 'ACD') 
Print (RET) # [ 'ACD'] 
 
RET = the re.findall ( '[AZ]', 'ACD') 
Print (RET) # [ ' A ',' C ',' D '] 
 
RET = the re.findall (' [. * +] ',' + a.cd ') 
Print (RET) # ['. ',' + '] 
 
# in the character set functional symbols: - ^ \ 
 
RET = the re.findall ( '[1-9]', '45dha3') 
Print (RET) # [ '. 4', '. 5', '. 3'] 
 
RET = the re.findall ( '[^ ab &]', '45bdha3') 
Print (RET) # [ '. 4', '. 5', 'D', 'H', '. 3'] 
 
RET = the re.findall ( '[\ D]', '45bdha3') 
Print (RET) # [ '. 4', '. 5', '. 3']

Yuan characters of the escape character \

Behind the backslash character with yuan to remove special features, such as \.
Backslash behind the realization of special functions with ordinary characters, such as \ d

\ d matches any decimal numbers; equivalent to the class [0-9].
\ D matches any non-numeric characters; equivalent to the class [^ 0-9].
\ s matches any whitespace character; equivalent to the class [\ t \ n \ r \ f \ v].
\ S matches any non-blank character; equivalent to the class [^ \ t \ n \ r \ f \ v].
\ w matches any alphanumeric character; equivalent to the class [a-zA-Z0-9_].
\ W matches any non-alphanumeric character; equivalent to the class [^ A-zA-Z0-9_]
\ B matches a special character boundaries, such as spaces, &, #, etc.

ret=re.findall('I\b','I am LIST')
print(ret)#[]
ret=re.findall(r'I\b','I am LIST')
print(ret)#['I']

  Now we chat \, look at the following two matches:

----------------------------- EG1 #: 
Import Re 
RET = the re.findall ( 'C \ L', 'ABC \ Le ') 
Print (RET) # [] 
RET = the re.findall (' C \\ L ',' ABC \ Le ') 
Print (RET) # [] 
RET = the re.findall (' C \\\\ L ', 'ABC \ Le') 
Print (RET) # [ 'C \\ L'] 
RET = the re.findall (R'C \\ L ',' ABC \ Le ') 
Print (RET) # [' C \\ L '] 
 
# ----------------------------- EG2: 
# chose \ b because \ b is in the ASCII table significance of 
m = the re.findall ( '\ bblow', 'Blow') 
Print (m) 
m = the re.findall (R & lt '\ bblow', 'Blow') 
Print (m)

  

The grouping metacharacters ()

m = re.findall(r'(ad)+', 'add')
print(m)
 
ret=re.search('(?P<id>\d{2})/(?P<name>\w{3})','23/com')
print(ret.group())#23/com
print(ret.group('id'))#23

The metacharacters |

ret = re.search ( '(ab) | \ d', 'rabhdg8sd') 
print (ret.group ()) # ab

The method commonly used in the re module

Re Import 
#. 1 
the re.findall ( 'A', 'Alvin Yuan') # return all matching result satisfies the condition, in the list 
# 2 
the re.search ( 'A', 'Alvin Yuan'). Group () # find function in the string pattern matching, only the first to find a match and return an object matching information, the object may 
                                     # matching string obtained by calling the group () method, if the string does not match, it returns none. 
 
. 3 # 
re.match ( 'A', 'ABC'). Group () # same search, but do match at the beginning of the string 
 
#. 4 
RET = re.split ( '[ab &]', 'ABCD') # press the 'a' obtained by dividing 'and' bcd ', of the' and 'BCD' by 'b' are divided 
Print (RET) # [ '', '', 'CD'] 
 
#. 5 
RET = Re. Sub ( '\ D', 'ABC', 'alvin5yuan6',. 1) 
Print (RET) # alvinabcyuan6 
RET = re.subn ( '\ D', 'ABC', 'alvin5yuan6'
 
print (ret.group ()) # 123
import re
ret=re.finditer('\d','ds3sy4784a')
print(ret)        #<callable_iterator object at 0x10195f940>
 
print(next(ret).group())
print(next(ret).group()

  note:

Re Import 
 
RET = re.findall ( 'the WWW (baidu |. Oldboy) .com', 'www.oldboy.com') 
Print (RET) # [ 'Oldboy'] This is because findall will give priority to the content of the group matches return, if you want to match results, canceled permission to 
 
RET = re.findall ( 'the WWW (?: baidu |. Oldboy) .com', 'www.oldboy.com') 
Print (RET) # [ 'www.oldboy .com ']

  supplement:

import re

print(re.findall("<(?P<tag_name>\w+)>\w+</(?P=tag_name)>","<h1>hello</h1>"))
print(re.search("<(?P<tag_name>\w+)>\w+</(?P=tag_name)>","<h1>hello</h1>"))
print(re.search(r"<(\w+)>\w+</\1>","<h1>hello</h1>"))

  Supplement 2:

# Matched all integers 
Import Re 

# RET = the re.findall (R & lt "\ {0} + D]", "1-2 * (60 + (- 40.35 /. 5) - (-. 4. 3 *))") 
RET the re.findall = (R & lt "-?. \ + D \ \ D * |? (- \ + D)", "1-2 * (60 + (- 40.35 /. 5) - (-. 4. 3 *))") 
RET .remove ( "") 

Print (RET)

  

Guess you like

Origin www.cnblogs.com/hlc-123/p/10990442.html