Create a regular expression object
import re
Common matching syntax
re.match matches from the beginning
re.search matches contains
re.findall puts all matching characters into elements in the list and returns
re.splitall uses the matched character as the list separator
re.sub matches characters and replaces
re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)') creates matching objects
Common Regular Expression Symbols
. matches any character except \n by default. If flag DOTALL is specified, it matches any character, including newline
^ matches the beginning of the character. If flags MULTILINE is specified, this can also match (r”^a”,”\nabc\neee”, flag=re.MULTILINE)
$ matches the end of the string, or e.search("foo$", "bfoo\nnsdfsf", flags=re.MULTILINE).group() also works
* Match the character before the * 0 or more times, re.findall("ab*","cabb3abcbbac") The result is ['abb','ab','a']
+ Match the previous character 1 or more times, re.findall("ab+","ab+cd+abb+bba") ['ab', 'abb']
? matches the previous character 1 or 0 times
{m} matches the previous character m times re.findall("b{3}","ab+cd+abbb+bba") ['bbb']
{n,m} matches the previous character n to m times re.findall("b{1,2}","ab+cd+abbb+bba") ['b', 'bb', 'b', ' bb'] followed by hello is the least matching, no adding is the most
| match |left or |right characters re.findall("b|c","ab+cd+abbb+bba") ['b', 'c', 'b', 'b', 'b', 'b', 'b']
(...) group matching
\A matches only from the beginning of the character
\Z matches characters ending with $
\d matches numbers 0-9
\D matches non-digits
\ w Matching [A-Za- z0-9 ]
\W matches non-[A-Za-z0-9]
\s matches whitespace characters. \t \n \r
\S matches all but whitespace characters. \t \n \r
(?P<name>…) group matching
match instance
1. Create a match object compile method
import re
a = re.compile(r'\d+')
a1 = a.search('gfd12341ahvcnxjbkafa')
print(a1.group()) # .group() outputs the result directly instead of returning the object
result
12341
2. Match the match method from scratch
import re
a = re.match("^w.+", "wdasfdsafdsa1223fdssfd33311")
b = re.match("^[a-z]+", "wdasfdsafdsa1223fdssfd33311")
c = re.search("R[a-zA-z]+a", "wdasfdsafdsa1223fdssfd33311")
print(a)
print(b)
print(c)
The result is as follows
<_sre.SRE_Match object; span=(0, 27), match='wdasfdsafdsa1223fdssfd33311'>
<_sre.SRE_Match object; span=(0, 12), match='wdasfdsafdsa'>
None
3. Match contains search method
import re
a = re.search("[a-z]+","abcdefg12345")
print(a.group())
The result is as follows
abcdefg
4. Pipeline matches multiple groups |
import re
hero = re.compile(r'ABC|DEF')
m1 = hero.search('ABC hehe ABC')
print(m1.group())
m2 = hero.search('DEF hehe ABC')
print(m2.group())
The result is as follows
ABC
DEF
5. Group match() and group
import re
phoneNum = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = phoneNum.search('my number is 415- 555-4242')
print(mo.group(1)) #output the first group
print(mo.group(2)) #output the second group
print(mo.group(0)) #output all
print(mo .group()) #output all
result
415
555-4242
415-555-4242
415-555-4242
6. Use question marks for optional matching
import re
b = re.compile(r'Bat(wo)?man')
mo = b.search('The Adventures of Batman')
print(mo.group())
mo1 = b.search('The Adventures of Batwoman')
print(mo1.group())
result
Batman
Batwoman
7. Match zero or more times with an asterisk
import re
b = re.compile(r'Bat(wo)*man')
mo = b.search('The Adventures of Batman')
print(mo.group())
mo1 = b.search('The Adventures of Batwoman')
print(mo1.group())
mo2 = b.search('The Adventures of Batwowowowowoman')
print(mo2.group())
result
Batman
Batwoman
Batwowowowowoman
import re
b = re.compile(r'Bat(wo)+man')
mo = b.search('The Adventures of Batwoman')
print(mo.group())
mo1 = b.search('The Adventures of Batwoman')
print(mo1.group())
mo2 = b.search('The Adventures of Batwowowowowoman')
print(mo2.group())
result
Batwoman
Batwoman
Batwowowowowoman