python notes-----regular expressions

Create a regular expression object

import re

Common matching syntax

re.match matches from the beginning

re.search matches contains

re.findall puts all matching characters into elements in the list and returns

re.splitall uses the matched character as the list separator

re.sub matches characters and replaces

re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)') creates matching objects

Common Regular Expression Symbols

. matches any character except \n by default. If flag DOTALL is specified, it matches any character, including newline

^ matches the beginning of the character. If flags MULTILINE is specified, this can also match (r”^a”,”\nabc\neee”, flag=re.MULTILINE)

$ matches the end of the string, or e.search("foo$", "bfoo\nnsdfsf", flags=re.MULTILINE).group() also works

* Match the character before the * 0 or more times, re.findall("ab*","cabb3abcbbac") The result is ['abb','ab','a']

+ Match the previous character 1 or more times, re.findall("ab+","ab+cd+abb+bba") ['ab', 'abb']

? matches the previous character 1 or 0 times

{m} matches the previous character m times re.findall("b{3}","ab+cd+abbb+bba") ['bbb']

{n,m} matches the previous character n to m times re.findall("b{1,2}","ab+cd+abbb+bba") ['b', 'bb', 'b', ' bb'] followed by hello is the least matching, no adding is the most

| match |left or |right characters re.findall("b|c","ab+cd+abbb+bba") ['b', 'c', 'b', 'b', 'b', 'b', 'b']

(...) group matching

\A matches only from the beginning of the character

\Z matches characters ending with $

\d matches numbers 0-9

\D matches non-digits

\ w Matching [A-Za- z0-9 ]

\W matches non-[A-Za-z0-9]

\s matches whitespace characters. \t \n \r

\S matches all but whitespace characters. \t \n \r

(?P<name>…) group matching

match instance

1. Create a match object compile method

import re
a = re.compile(r'\d+')
a1 = a.search('gfd12341ahvcnxjbkafa')
print(a1.group()) # .group() outputs the result directly instead of returning the object

result

12341

2. Match the match method from scratch

import re
a = re.match("^w.+", "wdasfdsafdsa1223fdssfd33311")
b = re.match("^[a-z]+", "wdasfdsafdsa1223fdssfd33311")
c = re.search("R[a-zA-z]+a", "wdasfdsafdsa1223fdssfd33311")
print(a)
print(b)
print(c)

The result is as follows

<_sre.SRE_Match object; span=(0, 27), match='wdasfdsafdsa1223fdssfd33311'>
<_sre.SRE_Match object; span=(0, 12), match='wdasfdsafdsa'>
None

3. Match contains search method

import re
a = re.search("[a-z]+","abcdefg12345")

print(a.group())

The result is as follows

abcdefg

4. Pipeline matches multiple groups |

import re
hero = re.compile(r'ABC|DEF')
m1 = hero.search('ABC hehe ABC')
print(m1.group())
m2 = hero.search('DEF hehe ABC')
print(m2.group())

The result is as follows

ABC
DEF

5. Group match() and group

import re

phoneNum = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = phoneNum.search('my number is 415- 555-4242')
print(mo.group(1)) #output the first group
print(mo.group(2)) #output the second group
print(mo.group(0)) #output all
print(mo .group()) #output all

result

415
555-4242
415-555-4242
415-555-4242

6. Use question marks for optional matching

import re
b = re.compile(r'Bat(wo)?man')
mo = b.search('The Adventures of Batman')
print(mo.group())
mo1 = b.search('The Adventures of Batwoman')
print(mo1.group())

result

Batman

Batwoman

7. Match zero or more times with an asterisk

import re

b = re.compile(r'Bat(wo)*man')
mo = b.search('The Adventures of Batman')
print(mo.group())
mo1 = b.search('The Adventures of Batwoman')
print(mo1.group())
mo2 = b.search('The Adventures of Batwowowowowoman')
print(mo2.group())

result

Batman
Batwoman
Batwowowowowoman

import re

b = re.compile(r'Bat(wo)+man')
mo = b.search('The Adventures of Batwoman')
print(mo.group())
mo1 = b.search('The Adventures of Batwoman')
print(mo1.group())
mo2 = b.search('The Adventures of Batwowowowowoman')
print(mo2.group())

result

Batwoman

Batwowowowowoman