Table of contents
Common matching patterns
model | describe |
---|---|
\w | Matches letters, numbers and underscores |
\W | Matches non-alphanumeric underscores |
\s | Matches any whitespace character, equivalent to [\t\n\r\f]. |
\S | Matches any non-empty character |
\d | Matches any number, equivalent to [0-9] |
\D | Matches any non-number |
\A | match string start |
\Z | Matches the end of the string. If there is a newline, only the end of the string before the newline is matched. |
\z | End of match string |
\G | Match the position where the last match was completed |
\n | Matches a newline character |
\t | match a tab character |
^ | Matches the beginning of a string |
$ | Matches the end of the string. |
. | Matches any character, except newline characters. When the re.DOTALL flag is specified, any character including newline characters can be matched. |
[…] | Used to represent a group of characters, listed separately: [amk] matches 'a', 'm' or 'k' |
[^…] | Characters not in []: [^abc] matches characters except a, b, c. |
* | Matches 0 or more expressions. |
+ | Matches 1 or more expressions. |
? | Matches 0 or 1 fragments defined by the previous regular expression, non-greedy way |
{n} | Exactly matches n previous expressions. |
{n, m} | Match n to m times the fragment defined by the previous regular expression, greedy mode |
a|b | match a or b |
( ) | Matches an expression within parentheses, also representing a group |
re.match matches a pattern from the beginning of the string
re.match attempts to match a pattern from the beginning of the string . If the match is not successful at the beginning, match() returns none. Summary: Try to use universal matching, use parentheses to get the matching target, try to use non-greedy mode, and use re.S if there is a newline character.
Generic matching
import re
content = 'Hello 123 4567 World_This is a Regex Demo'
result = re.match('^Hello.*Demo$', content)
print(result)
print(result.group())
print(result.span())
<re.Match object; span=(0, 41), match='Hello 123 4567 World_This is a Regex Demo'>
Hello 123 4567 World_This is a Regex Demo
(0, 41)
match target
import re
content = 'Hello 1234567 World_This is a Regex Demo'
result = re.match('^Hello\s(\d+)\sWorld.*Demo$', content)
print(result)
print(result.group(1))
print(result.span())
<re.Match object; span=(0, 40), match='Hello 1234567 World_This is a Regex Demo'>
1234567
(0, 40)
greedy matching
import re
content = 'Hello 1234567 World_This is a Regex Demo'
result = re.match('^He.*(\d+).*Demo$', content)
print(result)
print(result.group(1))
<re.Match object; span=(0, 40), match='Hello 1234567 World_This is a Regex Demo'>
7
non-greedy matching
import re
content = 'Hello 1234567 World_This is a Regex Demo'
result = re.match('^He.*?(\d+).*Demo$', content)
print(result)
print(result.group(1))
<_sre.SRE_Match object; span=(0, 40), match='Hello 1234567 World_This is a Regex Demo'>
1234567
match pattern
import re
content = '''Hello 1234567 World_This
is a Regex Demo
'''
result = re.match('^He.*?(\d+).*?Demo$', content, re.S) # 可以匹配到换行
print(result.group(1))
1234567
escape
import re
content = 'price is $5.00'
result = re.match('price is \$5\.00', content)
print(result)
<re.Match object; span=(0, 14), match='price is $5.00'>
re.search scans the entire string and returns the first successful match
re.search scans the entire string and returns the first successful match.
import re
content = 'Extra stings Hello 1234567 World_This is a Regex Demo Extra stings'
result = re.match('Hello.*?(\d+).*?Demo', content)
print(result) # None
# 总结:为匹配方便,能用search就不用match
import re
content = 'Extra stings Hello 1234567 World_This is a Regex Demo Extra stings'
result = re.search('Hello.*?(\d+).*?Demo', content)
print(result)
print(result.group(1))
<_sre.SRE_Match object; span=(13, 53), match='Hello 1234567 World_This is a Regex Demo'>
1234567
re.findall returns all matching substrings in list form
Search a string and return all matching substrings in list form.
import re
html = '''<div id="songs-list">
<h2 class="title">经典老歌</h2>
<p class="introduction">
经典老歌列表
</p>
<ul id="list" class="list-group">
<li data-view="2">一路上有你</li>
<li data-view="7">
<a href="/2.mp3" singer="任贤齐">沧海一声笑</a>
</li>
<li data-view="4" class="active">
<a href="/3.mp3" singer="齐秦">往事随风</a>
</li>
<li data-view="6"><a href="/4.mp3" singer="beyond">光辉岁月</a></li>
<li data-view="5"><a href="/5.mp3" singer="陈慧琳">记事本</a></li>
<li data-view="5">
<a href="/6.mp3" singer="邓丽君">但愿人长久</a>
</li>
</ul>
</div>'''
results = re.findall('<li.*?>\s*?(<a.*?>)?(\w+)(</a>)?\s*?</li>', html, re.S)
print(results)
for result in results:
print(result[1])
[('', '一路上有你', ''), ('<a href="/2.mp3" singer="任贤齐">', '沧海一声笑', '</a>'), ('<a href="/3.mp3" singer="齐秦">', '往事随风', '</a>'), ('<a href="/4.mp3" singer="beyond">', '光辉岁月', '</a>'), ('<a href="/5.mp3" singer="陈慧琳">', '记事本', '</a>'), ('<a href="/6.mp3" singer="邓丽君">', '但愿人长久', '</a>')]
一路上有你
沧海一声笑
往事随风
光辉岁月
记事本
但愿人长久
re.sub replaces each matching substring in the string and returns the replaced string.
Returns the replaced string after replacing each matching substring in the string.
import re
content = 'Extra stings Hello 1234567 World_This is a Regex Demo Extra stings'
content = re.sub('\d+', '', content)
print(content)
# Extra stings Hello World_This is a Regex Demo Extra stings
import re
content = 'Extra stings Hello 1234567 World_This is a Regex Demo Extra stings'
content = re.sub('(\d+)', r'\1 8910', content)
print(content)
# Extra stings Hello 1234567 8910 World_This is a Regex Demo Extra stings
re.compile compiles regular strings into regular expression objects
Compile regular strings into regular expression objects
import re
content = '''Hello 1234567 World_This
is a Regex Demo'''
pattern = re.compile('Hello.*Demo', re.S)
result = re.match(pattern, content)
#result = re.match('Hello.*Demo', content, re.S)
print(result)
# <re.Match object; span=(0, 40), match='Hello 1234567 World_This\nis a Regex Demo'>