Python regular -match, search, findall Examples Analytical & distinction

Content 

match
match string at the beginning , successfully returned Match object, failed to return None, only a match .

search
search in string (but not limited to begin with) , successfully returned Match object, failed to return None, only a match .

findall
find in string all successfully matched sets , i.e., the portion enclosed in parentheses. Returns the object list, each list item is a list of all matching sets each thereof.

 

 

1. match

re.match () always from "the beginning" string to match and return to match the object matching string. So when I use re.match () function to match the string is not the beginning of the string will return NONE.

eg1

Examples below, only the 'string3' can print out the results of p, the other output is 'NONE'.

import re

string1='I love python but hate pig'
string2='I love python'
string3='python'
string4='123'
result = re.match(r'[p]', string1)
print(result)

 

import re
 
# 将正则表达式编译成Pattern对象
pattern = re.compile(r'hello')
 
# 使用Pattern匹配文本,获得匹配结果,无法匹配时将返回None
match = pattern.match('hello world, hello word')
 
if match:
    # 使用Match获得分组信息
    print (match.group())


hello

Intuitively, re.match () use is limited. Matching string at the beginning , only a match .

Look at individual needs, which sometimes also quite useless. Next is the large-scale expansion, introduce multiple matches.

1.1 between the matching characters a to z

string3='python'
string4='123'
result = re.match(r'[a-z]', string3)
print(result) # p

Matching characters between 1.2 A to Z

string3='Python'
string4='123'
result = re.match(r'[A-Z]', string3)
print(result) # P

1.3 matching characters between 0 to 9

ma = re.match(r'[0-9]',string4) 
print (ma.group())

1.4 az, AZ and 0-9 may be used in combination

string3='python'
string4='123'
result = re.match(r'[a-zA-Z0-9]', string3)
print(result)

\ w and \ W supra, each matching word character [a-zA-Z0-9] and non-word character  .

 

1.5  matching digital / non-digital

string4 = '[];;:'
ma1 = re.match(r'\D',string4)#匹配非数字
ma2 = re.match(r'\d',string2)#匹配数字
print (ma1.group())    # [
# print (ma2.group())  # raise error

1.6  matching white and non-white character

\ S and \ S ibid., Were blank and matching non-whitespace characters.

 

1.7  match 0 to infinity times: * (asterisk)

ma = re.match(r'[a-z][a-z]*',string1)

Match 1 to 1.8 times infinity: + (plus)

 

String match occurs 1.9 m to n times: {m, n} 

ma = re.match(r'[\w]{1,4}',string1)任意字母和数字出现1到4次

2. search

Search in string (but not limited to begin with) , successfully returned Match object, failed to return None, only a match .

Wildcard is still the same and before.

For more wildcards, refer to the following blog:

https://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html

eg

import re
string5 = '[email protected]'
ma6 = re.search(r'[\d]+',string5)  #匹配数字
print(ma6)
print(ma6.group())


output:
<_sre.SRE_Match object; span=(9, 12), match='676'>
676
import re

string1='I love python but hate pig'
string2='I love python'
string3='python'
string4='123'
result = re.search(r'[\w]+', string3)
print(result)  # python

result2 = re.search(r'[\w]+', string2)
print(result2)  # I

Note that, when the input string2, search result to the 'I'. Because the space (non-character) between 'I' and 'love', so \ w unrecognized.

Bluntly said, encountered a wildcard character unrecognized, search is over, because only returns a result.

string4='123 45'
result = re.search(r'[\d]+', string4)
print(result)  # 123

Of course, also be forced to do so, to match all characters. 

str2 = 'char|johljh'
ma6 = re.search(r'char[\W][\w]+',str2)
print(ma6.group())

char|johljh
str = 'oajfs|char|dhddfgdfg'
str2 = 'char|johljh|jjgkhk'
str3 = 'dlkngldnfk|flmgkdm|char'

ma6 = re.search(r'char[\W][\w]+',str) #此pattern可以匹配到str str2
print(ma6.group())

ma6 = re.search(r'[\W]char',str3) # 此pattern 可以匹配到str3
print(ma6.group())

3. findall

Find in string all successfully matched sets , i.e., the portion enclosed by parentheses. Returns the object list, each list item is a list of all matching sets each thereof.

import re

string1='I love python but hate pig'
string2='I love python'
string3='python'
string4='123 45'

result0 = re.findall(r'[p]+', string1)
result1 = re.findall(r'[p][a-z]+', string1)
result2 = re.findall(r'[\w]+', string2)
result4 = re.findall(r'[\d]+', string4)

print(result0)
print(result1)
print(result2)
print(result4)

output:

['p', 'p']
['python', 'pig']    --->  highly recommend       result1 = re.findall(r'[p][a-z]+', string1)
['I', 'love', 'python']
['123', '45']
string2 = '1,2,3,4'
ma = re.findall(r'\d+',string2)
print (ma)

#['1', '2', '3', '4']
import re
 
p = re.compile(r'\d+')
print (p.findall('one1two2three3four456'))
 
### output ###
# ['1', '2', '3', '456']

 

 

References:

https://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html

https://blog.csdn.net/ali197294332/article/details/50894419

https://blog.csdn.net/tp7309/article/details/72823258

https://blog.csdn.net/djskl/article/details/44357389

Published 18 original articles · won praise 5 · views 10000 +

Guess you like

Origin blog.csdn.net/Zhou_Dao/article/details/103943192