Detailed explanation of regular expressions in re module in Python

re.match() function

re.match will match the pattern from the beginning of the string. If the match is unsuccessful, or if the match is not successful at the beginning, it will return None. When the match is successful, a matching object is returned, and we can use the matching object function to obtain the matching expression.

re.match(pattern, string, flags=0)

pattern : matching regular expression
string : the string to be matched
flags : Flag bits, used to control the matching method of regular expressions, such as whether it is case-sensitive, whether it is multi-line matching, etc.

For example re.I, the case in the string is ignored when matching.

import re

print(re.match('aa','Aaaabc',flags=re.I))    # 起始位置匹配
print(re.match('aa','bbaabc',flags=re.I))    # 不再起始位置匹配
# span()返回一个元组包含匹配开始和结束的位置
print(re.match('aa','Aaaabc',flags=re.I).span())

Insert picture description here
Use the group(num)or groups()match object function to get the match expression.

group(num=0) : According to the input group number (multiple group numbers), return a string (tuple), the tuple contains the corresponding value of the group number. The default is 0, which means that all content matched by match is returned.
groups() : Returns a tuple containing all group strings, from 1 to the group number contained.

line = "Cats are smarter than dogs"
matchObj = re.match(r'(.*) smarter (.*?) (.*)', line)

if matchObj:
    print("matchObj.group() : ", matchObj.group())
    print("matchObj.group(1) : ", matchObj.group(1))
    print("matchObj.group(0, 1, 2, 3) : ", matchObj.group(0, 1, 2, 3))
    print("matchObj.groups() : ", matchObj.groups())
else:
    print("No match!!")
print(matchObj)

Insert picture description here

re.search() function

re.search ()Match the entire string and return the first successful match.

re.search(pattern, string, flags=0)

print(re.search('aa','aaaabc').span())   # 在起始位置匹配
print(re.search('aa','bbaaac').span())    # 不在起始位置匹配
# NoneType对象没有span()
print(re.search('aa','bbaccc'))    # 匹配失败

运行结果：
(0, 2)
(2, 4)
None

matchAnd searchdiffers well understood, matchis only to match the beginning of the string, if the match fails, the latter will no longer match, return directly None. The searchmatch in the entire string, if there is no match to the starting position, then it will continue to match the rest of the string until the first match is successful, if not the end of the string matches, then return None.

re.sub() function

re.sub()The function can replace all the content in the string that meets the matching conditions.

re.search(pattern, repl, string, count, flags=0)

repl : The replaced string, which can be a function.
count : The maximum number of replacements after a successful match, the default is 0, which means to replace all matches.

line = '女朋友是女神'
print(re.sub(r'(?<=是).*', '仙女', line))

运行结果：
女朋友是仙女

With lambdause

line = 'A2B2C3'
print(re.sub(r'(\d+)', lambda x: str(int(x.group())*2), line))

运行结果：
A4B4C6

re.compile() function

re.compile()Used to compile regular expressions to generate a regular expression (Pattern) object.

re.compile(pattern[, flags])

pattern : a regular expression in the form of a string.

pattern = re.compile(r'\d+')
print(pattern.match('one12twothree34four'))
print(pattern.match('one12twothree34four', 3, 10))

运行结果：
None
(3, 5)

re.findall() function

re.findall() is used to find all the substrings matched by the regular expression in the string and return a list. If no match is found, an empty list is returned.

re.findall(string[, pos[, endpos]])

string: The string to be matched.
pos: Optional parameter, specifies the starting position of the string, the default is 0.
endpos: Optional parameter, specifies the end position of the string, the default is the length of the string.

pattern = re.compile(r'\d+')
print(pattern.findall('A1B2C3'))
print(pattern.findall('A1B2C3', 0, 4))

运行结果：
['1', '2', '3']
['1', '2']

re.finditer() function

re.finditer() finds all the substrings matched by the regular expression in the string and returns them as an iterator.

re.finditer(pattern, string, flags=0)

result = re.finditer(r"\d+", "A1B2C3")
for match in result:
    print(match.group())

运行结果：
1
2
3

Knowledge reference: rookie tutorial