table of Contents
re module
Regular expressions relationship with the re module
1: A regular expression is an independent technology.
2: Regular can be used in any language.
3: python in order to use regular expressions need to re module.
Regular Expressions
Regular expressions are a special character sequence, it can help you to easily check whether a string matches a pattern.
- Metacharacters
Metacharacters | Matched content |
---|---|
. | Matches any character except a line break accident |
\w | Match letters or numbers or an underscore |
\s | Matches any whitespace |
\d | Matching numbers |
\n | Matches a newline |
\t | A matching tab |
\b | Match the end of a word |
^ | Matches the beginning of the string |
$ | End of the string |
\W | Non-matching characters or numbers or an underscore |
\D | Matching non-numeric |
\S | Matching non-whitespace characters |
a|b | A matching character or character b |
() | Matching expression in parentheses, also represents a group |
[...] | Matches the character set of characters |
[^...] | Matches all characters except the characters in the character set |
- quantifier
quantifier | Said usage name |
---|---|
* | Repeated zero or more times |
+ | Repeated one or more times |
? | Repeat zero or one time |
{n} | N times |
{n,} | Repeated n times or more |
{n,m} | To n times m times |
Regular check whether the difference
# 纯python代码校验
while True:
phone_number = input('请输入你的手机号码>>>:').strip()
if len(phone_number) == 11 \
and phone_number.isdigit() \
and(phone_number.startswith('13')) \
or(phone_number.startswith('14')) \
or(phone_number.startswith('15')) \
or(phone_number.startswith('16')) \
or(phone_number.startswith('17')) \
or(phone_number.startswith('18')):
print('手机号码格式正确')
else:
print('手机号码格式不正确')
# 正则表达式校验
import re
phone_number = input('请输入你的手机号码>>>:').strip()
if re.match('^(13|14|15|16|17|18)[0-9]{9}$',phone_number):
print('手机号码格式正确')
else:
print('手机号码格式不正确')
Online test regular expressions
Just a test regular expressions: http://tool.chinaz.com/regex/
Regular expressions scenarios: reptiles, data analysis
Regular Expression symbol:
- Character set []
The relationship between a string inside expressions are domain
- ^ Used in conjunction with the $ symbol
What the middle of the two write, matching string must be what it is, a little more than a not even think about, will limit the accuracy of the contents of the configuration.
- abc|ab
With | a long time to be sure to put the front.
- ^,[^]
^ Write directly on the outside, limiting beginning of the string, [^] negated.
- Packet ()
When a plurality of regular symbols repeated as many times as a whole, or other operations, it can be used in the form of packets.
To understanding:
- \ W, \ s, \ d and \ W, \ S, \ D matches the inverse relationship (combination of both is to match all)
- \ T match tab (Tab)
- \ B matches the end of the specified word idea
quantifier:
- +: The \ d, if there is 123456, \ d matches to only a number, but the plus +, you can match all at once. + Indicates repeated one or more times, then the default match is greedy (better) in n.
- *: Matches zero or more times
- ? : Match zero or one
- {N}: explicitly specifying the number of matches
note:
1: a regular on match default is greedy match (try to match more). You can sign? It can be matched to become non-greedy greedy matching (matching inert).
2: quantifiers must follow the regular behind the symbol.
3: quantifier can only limit it next to that of a regular symbol.
The basic use of the re module
Regular expression matching is used to process the string, python using regular expressions necessary to introduce re module
- findall
import re
res = re.findall('a','apple apple apple') # 返回所有满足匹配条件的结果,放在列表里。
print(res)
# ['a', 'a', 'a']
- search
import re
res = re.search('a','Apple apple apple') # 函数会在字符串中查找匹配,找到第一个匹配和返回一个包含匹配信息的对象。
print(res)
print(res.group()) # 如调用 group 可以取出返回的对象,如果返回的是None就会直接报错
if res: # 当返回值为 None 时,就不会执行 group,不会报错
print(ret.group())
# <_sre.SRE_Match object; span=(5, 6), match='a'>
# a
- match
import re
res = re.match('a','apple apple apple')
print(res)
print(res.group())
# match是从头开始匹配,如果正则规则从头开始可以匹配上,就返回一个对象,需要用group才能显示,如果没匹配上就返回None,调用group()就会报错
# <_sre.SRE_Match object; span=(0, 1), match='a'>
# a
Other methods
- split
import re
res = re.split('[ab]','abcd') # 先按'a'分割得到''和'bcd',在对''和'bcd'分别按'b'分割
print(res)
# ['', '', 'cd']
- sub
import re
res = re.sub('\d','A','apple1apple2apple3') # 将字符串中的数字替换成'A'
res1 = re.sub('\d','A','apple1apple2apple3',1) # 参数1表示替换1个
print(res)
print(res1)
# appleAappleAappleA
# appleAapple2apple3
- subn
import re
res = re.subn('\d','A','apple1apple2apple3') # 将数字替换成'A'后,返回元组(替换结果,替换个数)
print(res)
# ('appleAappleAappleA', 3)
- compile
import re
obj = re.compile('\d{3}') # 将正则表达式编译成一个正则表达式对象,规定要匹配的是3个数字
res = obj.search('app111app') # 正则表达式对象调用 search,参数为待匹配的字符串
print(res.group())
# 111
- cleave
import re
res = re.finditer('\d','1apple2apple3456') # finditer 返回一个存放匹配结果的迭代器
print(res)
print(next(res).group()) # 查看第一个结果
print(next(res).group()) # 查看第二个结果
print([i.group() for i in res]) # 查看剩余的结果
# <callable_iterator object at 0x00000070AA9CC438>
# 1
# 2
# ['3', '4', '5', '6']