Six regular expression: Value Packet (packet parentheses effect and action values backslash 1)
Mainly used in the text of html tags matched with
Limiting the input label format (must be done consistently)
using () and the value of \ 1, \ 2 taken packet
# 错误示范,以下当html_str = "<h1>hahaha</h2>"的时候,结果一样会输出
# 没有达到前后一致的限制
import re
html_str = "<h1>hahaha</h1>"
ret = re.match(r"<\w*>.*</\w*>", html_str)
print(ret.group())
# 正确示范,()有分组作用,正则表达式中\1可以取到分组的第一个
import re
html_str = "<h1>hahaha</h1>"
ret = re.match(r"<(\w*)>.*</\1>", html_str)
print(ret.group())
# 正确示范,()有分组作用,正则表达式中\1和\2取值顺序
import re
html_str = "<body><h1>hahaha</h1></body>"
ret = re.match(r"<(\w*)><(\w*)>.*</\2></\1>", html_str)
print(ret.group())
When the packet is too much, it can give group name, the value of time to pick up directly by name:
(?P<name>) #命名的格式 (注意P是大写的)
(?P=name) #取值的格式
import re
html_str = "<body><h1>hahaha</h1></body>"
ret = re.match(r"<(?P<p1>\w*)><(?P<p2>\w*)>.*</(?P=p2)></(?P=p1)>", html_str)
print(ret.group())