python快速编程上手-正则表达式

用正则表达式查找文本模式

1)正则表达式匹配练习
1.用Import re导入正则表达式模块
2.用re.copile()函数创建一个Regex对象（记得使用原始字符串）
3.向Regex对象的search（）方法传入想查找的字符串返回一个Match对象
4.调用Match对象的group()方法返回实际匹配文本的字符串

 import re
 phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
 mo = phoneNumRegex.search('My number is 415-555-2456')
 print('Phone number found:' +mo.group())
Phone number found:415-555-2456
 mo.group(1)
'415'
 mo.group(2)
'555-2456'

以上利用括号进行分组

想要一次获得所有的分组，使用groups()方法注意函数名的复数形式

>>> mo.groups()
('415', '555-2456')

若正则表达式中有特殊的含义 +转义符号

>>> phoneNumRegex = re.compile(r'\((\d\d\d\))-(\d\d\d-\d\d\d\d)')
>>> mo = phoneNumRegex.search('My number is (415)-555-2456')
>>> print('Phone number found:' +mo.group())
Phone number found:(415)-555-2456

2)用管道匹配多个分组
|被称为管道希望匹配中许多表达式中的一个时可以使用它表示 ‘或’

>>> heroRegex = re.compile(r'Batman|Tina Fey')
>>> mo1 = heroRegex.search('Batman and Tina Fey.')
>>> mo1.group()
'Batman '

>>> heroRegex = re.compile(r'Batman|Tina Fey')
>>> mo1 = heroRegex.search('Tina Fey and Batman.')
>>> mo1.group()
'Tina Fey'

也可以使用管道符来匹配多个模式中的一个，作为正则表达式的一部分。例如

>>> batRegex = re.compile(r'Bat(man|mobile|copter|bat)')
>>> mo = batRegex.search('Batmobile lost a wheel')
>>> mo.group()
'Batmobile'
>>> mo.group(1)
'mobile'

mo.group(1)只返回第一个括号内匹配的文本‘mobile’通过使用管道符合分组括号指定几种可选的模式让正则表达式去匹配。

3)用问号实现可选匹配
?字符表明前面的分组在这个模式中是可选的例如，在交互式环境中输入以下代码

>>> batRegex = re.compile(r'Bat(wo)?man')
>>> mo = batRegex.search('The adventures of Batman')
>>> mo.group()
'Batman'
>>> mo1 = batRegex.search('The adventures of Batwoman')
>>> mo1.group()
'Batwoman'

wo是可选的部分 wo将出现0次或者1次

例如电话号码的：

>>> phoneNumRegex = re.compile(r'(\d\d\d-)?(\d\d\d-\d\d\d\d)')
>>> mo = phoneNumRegex.search('My number is 555-2456')
>>> mo.group()
'555-2456'

4)用*号匹配0次或多次

>>> batRegex = re.compile(r'Bat(wo)*man')
>>> mo = batRegex.search('The adventures of Batman')
>>> mo.group()
'Batman'
>>> mo = batRegex.search('The adventures of Batwoman')
>>> mo.group()
'Batwoman'
>>> mo = batRegex.search('The adventures of Batwowowowoman')
>>> mo.group()
'Batwowowowoman'

5)加号匹配一次或多次

>>> batRegex = re.compile(r'Bat(wo)+man')
>>> mo = batRegex.search('The adventures of Batwoman')
>>> mo.group()
'Batwoman'
>>> mo = batRegex.search('The adventures of Batman')
>>> mo == None
True

6)用花括号限定次数
(Ha){3,5}表示匹配’HaHaHa’,’HaHaHaHa’,’HaHaHaHaHa’

(Ha){3}
(Ha)(Ha){Ha}

>>> haRegex = re.compile(r'(Ha){3}')
>>> mo = haRegex.search('HaHaHa')
>>> mo.group()
'HaHaHa'
>>> mo1 = haRegex.search('Ha')
>>>> mo1 == None
True

7.4默认python贪心匹配
尽量次数多的匹配（花括号中出现）

7.5findall()方法
作为findall()方法的返回结果的总结，请记住以下两点：
1.如果调用在一个没有分组的正则表达式上，将返回一个匹配字符的列表
2.如果调用在一个有分组的正则表达式上将返回一个字符串的元组的列表

>>> phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
>>> phoneNumRegex.findall('cell:415-555-9878 work: 415-555-2456')
[('415', '555-9878'), ('415', '555-2456')]
>>> phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
>>> phoneNumRegex.findall('cell:415-555-9878 work: 415-555-2456')
['415-555-9878', '415-555-2456']

建立自己的字符分类：

>>> import re
>>> vowelRegex = re.compile(r'[aEiosnvOU]')
>>> vowelRegex.findall('RoBOCup eats baby food.BABAY FOOD')
['o', 'O', 'a', 's', 'a', 'o', 'o', 'O', 'O']

插入符号和美元符号：
^表示匹配发生在开始处
$表示匹配发生在结束处
如果使用了这两个那么整个字符串必须匹配该正则表达式

>>> WithHello = re.compile(r'^Hello')
>>> WithHello.search(' Hello World!') == None
True

>>> wholeStringisnum = re.compile(r'^\d+$')
>>> wholeStringisnum.search('123456789+')
>>> wholeStringisnum.search('1234567891')
<_sre.SRE_Match object; span=(0, 10), match='1234567891'>
>>> wholeStringisnum.search('123456789+').group()

参考链接

python快速编程上手-正则表达式

用正则表达式查找文本模式

猜你喜欢