Content
match
match string at the beginning , successfully returned Match object, failed to return None, only a match .search
search in string (but not limited to begin with) , successfully returned Match object, failed to return None, only a match .findall
find in string all successfully matched sets , i.e., the portion enclosed in parentheses. Returns the object list, each list item is a list of all matching sets each thereof.
1. match
re.match () always from "the beginning" string to match and return to match the object matching string. So when I use re.match () function to match the string is not the beginning of the string will return NONE.
eg1
Examples below, only the 'string3' can print out the results of p, the other output is 'NONE'.
import re
string1='I love python but hate pig'
string2='I love python'
string3='python'
string4='123'
result = re.match(r'[p]', string1)
print(result)
import re
# 将正则表达式编译成Pattern对象
pattern = re.compile(r'hello')
# 使用Pattern匹配文本,获得匹配结果,无法匹配时将返回None
match = pattern.match('hello world, hello word')
if match:
# 使用Match获得分组信息
print (match.group())
hello
Intuitively, re.match () use is limited. Matching string at the beginning , only a match .
Look at individual needs, which sometimes also quite useless. Next is the large-scale expansion, introduce multiple matches.
1.1 between the matching characters a to z
string3='python'
string4='123'
result = re.match(r'[a-z]', string3)
print(result) # p
Matching characters between 1.2 A to Z
string3='Python'
string4='123'
result = re.match(r'[A-Z]', string3)
print(result) # P
1.3 matching characters between 0 to 9
ma = re.match(r'[0-9]',string4)
print (ma.group())
1.4 az, AZ and 0-9 may be used in combination
string3='python'
string4='123'
result = re.match(r'[a-zA-Z0-9]', string3)
print(result)
\ w and \ W supra, each matching word character [a-zA-Z0-9] and non-word character .
1.5 matching digital / non-digital
string4 = '[];;:'
ma1 = re.match(r'\D',string4)#匹配非数字
ma2 = re.match(r'\d',string2)#匹配数字
print (ma1.group()) # [
# print (ma2.group()) # raise error
1.6 matching white and non-white character
\ S and \ S ibid., Were blank and matching non-whitespace characters.
1.7 match 0 to infinity times: * (asterisk)
ma = re.match(r'[a-z][a-z]*',string1)
Match 1 to 1.8 times infinity: + (plus)
String match occurs 1.9 m to n times: {m, n}
ma = re.match(r'[\w]{1,4}',string1)任意字母和数字出现1到4次
2. search
Search in string (but not limited to begin with) , successfully returned Match object, failed to return None, only a match .
Wildcard is still the same and before.
For more wildcards, refer to the following blog:
https://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html
eg
import re
string5 = '[email protected]'
ma6 = re.search(r'[\d]+',string5) #匹配数字
print(ma6)
print(ma6.group())
output:
<_sre.SRE_Match object; span=(9, 12), match='676'>
676
import re
string1='I love python but hate pig'
string2='I love python'
string3='python'
string4='123'
result = re.search(r'[\w]+', string3)
print(result) # python
result2 = re.search(r'[\w]+', string2)
print(result2) # I
Note that, when the input string2, search result to the 'I'. Because the space (non-character) between 'I' and 'love', so \ w unrecognized.
Bluntly said, encountered a wildcard character unrecognized, search is over, because only returns a result.
string4='123 45'
result = re.search(r'[\d]+', string4)
print(result) # 123
Of course, also be forced to do so, to match all characters.
str2 = 'char|johljh'
ma6 = re.search(r'char[\W][\w]+',str2)
print(ma6.group())
char|johljh
str = 'oajfs|char|dhddfgdfg'
str2 = 'char|johljh|jjgkhk'
str3 = 'dlkngldnfk|flmgkdm|char'
ma6 = re.search(r'char[\W][\w]+',str) #此pattern可以匹配到str str2
print(ma6.group())
ma6 = re.search(r'[\W]char',str3) # 此pattern 可以匹配到str3
print(ma6.group())
3. findall
Find in string all successfully matched sets , i.e., the portion enclosed by parentheses. Returns the object list, each list item is a list of all matching sets each thereof.
import re
string1='I love python but hate pig'
string2='I love python'
string3='python'
string4='123 45'
result0 = re.findall(r'[p]+', string1)
result1 = re.findall(r'[p][a-z]+', string1)
result2 = re.findall(r'[\w]+', string2)
result4 = re.findall(r'[\d]+', string4)
print(result0)
print(result1)
print(result2)
print(result4)
output:
['p', 'p'] ['python', 'pig'] ---> highly recommend result1 = re.findall(r'[p][a-z]+', string1) ['I', 'love', 'python'] ['123', '45']
string2 = '1,2,3,4'
ma = re.findall(r'\d+',string2)
print (ma)
#['1', '2', '3', '4']
import re
p = re.compile(r'\d+')
print (p.findall('one1two2three3four456'))
### output ###
# ['1', '2', '3', '456']
References:
https://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html
https://blog.csdn.net/ali197294332/article/details/50894419