Python进阶——re模块

import re

从字符串的开头进行匹配
用法： re.match(pattern,string)

import re
pat="\d+"
s="abc123abc123456"
print(re.match(pat,s))

输出结果为：None

因为re.match()函数只匹配开头，字符串s的开头不是数字，所以返回值为None

与re.match（）函数不同，re.search()函数会用正则表达式取匹配字符串的所有子串，如果找到，返回第一个匹配对应的Match对象，否则返回None:

search=re.search(pat,s)
print(search.group(0))

输出结果为：123

re.split()函数使用指定的正则表达式最为分割符，对字符串进行分割，其用法为：
re.split(pattern,string[, maxsplit])

t=re.split(" +","a b c  d  e")  # 用一个或多个空格来把这些字符分开
print(t)

输出结果为：['a', 'b', 'c', 'd', 'e']

re.sub()函数对字符串中正则表达式所匹配的部分进行替换，其方法为：re.sub(pattern,repl,string[,count])

t=re.sub(" +",";","a b c  d   e")
print(t)

输出的结果为：a;b;c;d;e

re.findall()函数返回一个包含所有匹配Match对象的列表

s='hello world'
p='hello (\w+)'
m=re.match(p,s)
print(m)

输出结果为：<_sre.SRE_Match object; span=(0, 11), match='hello world'>

print(m.group(0))
输出结果为：hello world
print(m.group(1)) #获取第一个括号中的匹配结果
world

Match 对象的.group(0)方法返回匹配的整个字符串，之后的1，2，3等返回的是正则表达式中每个括号匹配的部分。
可以用.groups() 查看每个括号的匹配值

print(m.groups())
输出结果为：('world',)

多个括号的情况

s='hello world san'
p='hello (\w+) (\w+)'
m=re.match(p,s)
print(m.groups())

输出结果为：('world', 'san')
print(m .group(2))

输出结果为：san

在正则表达式中，“\”来匹配一个反斜杠，在python 中，字符串本身对反斜杠也是有转义的，因此“\“会被先转义为单个反斜杠：

print('\\')
输出结果为：	\

因此在正则表达式中要使用反斜杠要使用四个反斜杠：

print('\\\\')
输出结果为：\\

这样看起来不是很方便，Python提供了不转义字符串来解决这个问题。不转义字符串的构造十分简单，只需要在普通字符串加上r,表示它是一个不转义的字符串。

t=re.split(r"\\",r"C:\foo\baz.txt")
print(t)
输出结果为：['C:', 'foo', 'baz.txt']