Detailed explanation of regular expressions in re module in Python
re.match() function
re.match will match the pattern from the beginning of the string. If the match is unsuccessful, or if the match is not successful at the beginning, it will return None. When the match is successful, a matching object is returned, and we can use the matching object function to obtain the matching expression.
re.match(pattern, string, flags=0)
- pattern : matching regular expression
- string : the string to be matched
- flags : Flag bits, used to control the matching method of regular expressions, such as whether it is case-sensitive, whether it is multi-line matching, etc.
For example re.I
, the case in the string is ignored when matching.
import re
print(re.match('aa','Aaaabc',flags=re.I)) # 起始位置匹配
print(re.match('aa','bbaabc',flags=re.I)) # 不再起始位置匹配
# span()返回一个元组包含匹配开始和结束的位置
print(re.match('aa','Aaaabc',flags=re.I).span())
Use the group(num)
or groups()
match object function to get the match expression.
- group(num=0) : According to the input group number (multiple group numbers), return a string (tuple), the tuple contains the corresponding value of the group number. The default is 0, which means that all content matched by match is returned.
- groups() : Returns a tuple containing all group strings, from 1 to the group number contained.
line = "Cats are smarter than dogs"
matchObj = re.match(r'(.*) smarter (.*?) (.*)', line)
if matchObj:
print("matchObj.group() : ", matchObj.group())
print("matchObj.group(1) : ", matchObj.group(1))
print("matchObj.group(0, 1, 2, 3) : ", matchObj.group(0, 1, 2, 3))
print("matchObj.groups() : ", matchObj.groups())
else:
print("No match!!")
print(matchObj)
re.search() function
re.search ()
Match the entire string and return the first successful match.
re.search(pattern, string, flags=0)
print(re.search('aa','aaaabc').span()) # 在起始位置匹配
print(re.search('aa','bbaaac').span()) # 不在起始位置匹配
# NoneType对象没有span()
print(re.search('aa','bbaccc')) # 匹配失败
运行结果:
(0, 2)
(2, 4)
None
match
And search
differs well understood, match
is only to match the beginning of the string, if the match fails, the latter will no longer match, return directly None
. The search
match in the entire string, if there is no match to the starting position, then it will continue to match the rest of the string until the first match is successful, if not the end of the string matches, then return None
.
re.sub() function
re.sub()
The function can replace all the content in the string that meets the matching conditions.
re.search(pattern, repl, string, count, flags=0)
- repl : The replaced string, which can be a function.
- count : The maximum number of replacements after a successful match, the default is 0, which means to replace all matches.
line = '女朋友是女神'
print(re.sub(r'(?<=是).*', '仙女', line))
运行结果:
女朋友是仙女
With lambda
use
line = 'A2B2C3'
print(re.sub(r'(\d+)', lambda x: str(int(x.group())*2), line))
运行结果:
A4B4C6
re.compile() function
re.compile()
Used to compile regular expressions to generate a regular expression (Pattern) object.
re.compile(pattern[, flags])
- pattern : a regular expression in the form of a string.
pattern = re.compile(r'\d+')
print(pattern.match('one12twothree34four'))
print(pattern.match('one12twothree34four', 3, 10))
运行结果:
None
(3, 5)
re.findall() function
re.findall() is used to find all the substrings matched by the regular expression in the string and return a list. If no match is found, an empty list is returned.
re.findall(string[, pos[, endpos]])
- string: The string to be matched.
- pos: Optional parameter, specifies the starting position of the string, the default is 0.
- endpos: Optional parameter, specifies the end position of the string, the default is the length of the string.
pattern = re.compile(r'\d+')
print(pattern.findall('A1B2C3'))
print(pattern.findall('A1B2C3', 0, 4))
运行结果:
['1', '2', '3']
['1', '2']
re.finditer() function
re.finditer() finds all the substrings matched by the regular expression in the string and returns them as an iterator.
re.finditer(pattern, string, flags=0)
result = re.finditer(r"\d+", "A1B2C3")
for match in result:
print(match.group())
运行结果:
1
2
3
Knowledge reference: rookie tutorial