re module with regular expressions

re module

Regular expressions relationship with the re module

1: A regular expression is an independent technology.

2: Regular can be used in any language.

3: python in order to use regular expressions need to re module.

Regular Expressions

Regular expressions are a special character sequence, it can help you to easily check whether a string matches a pattern.

  • Metacharacters
Metacharacters Matched content
. Matches any character except a line break accident
\w Match letters or numbers or an underscore
\s Matches any whitespace
\d Matching numbers
\n Matches a newline
\t A matching tab
\b Match the end of a word
^ Matches the beginning of the string
$ End of the string
\W Non-matching characters or numbers or an underscore
\D Matching non-numeric
\S Matching non-whitespace characters
a|b A matching character or character b
() Matching expression in parentheses, also represents a group
[...] Matches the character set of characters
[^...] Matches all characters except the characters in the character set
  • quantifier
quantifier Said usage name
* Repeated zero or more times
+ Repeated one or more times
? Repeat zero or one time
{n} N times
{n,} Repeated n times or more
{n,m} To n times m times

Regular check whether the difference

# 纯python代码校验
while True:
    phone_number = input('请输入你的手机号码>>>:').strip()
    if len(phone_number) == 11 \
        and phone_number.isdigit() \
        and(phone_number.startswith('13')) \
        or(phone_number.startswith('14')) \
        or(phone_number.startswith('15')) \
        or(phone_number.startswith('16')) \
        or(phone_number.startswith('17')) \
        or(phone_number.startswith('18')):
        print('手机号码格式正确')
    else:
        print('手机号码格式不正确')
        
# 正则表达式校验
import re
phone_number = input('请输入你的手机号码>>>:').strip()
if re.match('^(13|14|15|16|17|18)[0-9]{9}$',phone_number):
    print('手机号码格式正确')
else:
    print('手机号码格式不正确')

Online test regular expressions

Just a test regular expressions: http://tool.chinaz.com/regex/

Regular expressions scenarios: reptiles, data analysis

Regular Expression symbol:
  • Character set []

The relationship between a string inside expressions are domain

  • ^ Used in conjunction with the $ symbol

What the middle of the two write, matching string must be what it is, a little more than a not even think about, will limit the accuracy of the contents of the configuration.

  • abc|ab

With | a long time to be sure to put the front.

  • ^,[^]

^ Write directly on the outside, limiting beginning of the string, [^] negated.

  • Packet ()

When a plurality of regular symbols repeated as many times as a whole, or other operations, it can be used in the form of packets.

To understanding:
  • \ W, \ s, \ d and \ W, \ S, \ D matches the inverse relationship (combination of both is to match all)
  • \ T match tab (Tab)
  • \ B matches the end of the specified word idea
quantifier:
  • +: The \ d, if there is 123456, \ d matches to only a number, but the plus +, you can match all at once. + Indicates repeated one or more times, then the default match is greedy (better) in n.
  • *: Matches zero or more times
  • ? : Match zero or one
  • {N}: explicitly specifying the number of matches

note:

1: a regular on match default is greedy match (try to match more). You can sign? It can be matched to become non-greedy greedy matching (matching inert).

2: quantifiers must follow the regular behind the symbol.

3: quantifier can only limit it next to that of a regular symbol.

The basic use of the re module

Regular expression matching is used to process the string, python using regular expressions necessary to introduce re module

  • findall
import re

res = re.findall('a','apple apple apple')  # 返回所有满足匹配条件的结果,放在列表里。
print(res)
# ['a', 'a', 'a']
  • search
import re

res = re.search('a','Apple apple apple')  # 函数会在字符串中查找匹配,找到第一个匹配和返回一个包含匹配信息的对象。
print(res)
print(res.group())  # 如调用 group 可以取出返回的对象,如果返回的是None就会直接报错

if res:  # 当返回值为 None 时,就不会执行 group,不会报错
    print(ret.group())
# <_sre.SRE_Match object; span=(5, 6), match='a'>
# a
  • match
import re

res = re.match('a','apple apple apple')
print(res)
print(res.group())
# match是从头开始匹配,如果正则规则从头开始可以匹配上,就返回一个对象,需要用group才能显示,如果没匹配上就返回None,调用group()就会报错
# <_sre.SRE_Match object; span=(0, 1), match='a'>
# a

Other methods

  • split
import re

res = re.split('[ab]','abcd')  # 先按'a'分割得到''和'bcd',在对''和'bcd'分别按'b'分割
print(res)
# ['', '', 'cd']
  • sub
import re

res = re.sub('\d','A','apple1apple2apple3')  # 将字符串中的数字替换成'A'
res1 = re.sub('\d','A','apple1apple2apple3',1) # 参数1表示替换1个
print(res)
print(res1)
# appleAappleAappleA
# appleAapple2apple3
  • subn
import re

res = re.subn('\d','A','apple1apple2apple3')  # 将数字替换成'A'后,返回元组(替换结果,替换个数)
print(res)
# ('appleAappleAappleA', 3)
  • compile
import re

obj = re.compile('\d{3}')  # 将正则表达式编译成一个正则表达式对象,规定要匹配的是3个数字
res = obj.search('app111app')  # 正则表达式对象调用 search,参数为待匹配的字符串
print(res.group())
# 111
  • cleave
import re

res = re.finditer('\d','1apple2apple3456') # finditer 返回一个存放匹配结果的迭代器
print(res)
print(next(res).group()) # 查看第一个结果
print(next(res).group()) # 查看第二个结果
print([i.group() for i in res]) # 查看剩余的结果
# <callable_iterator object at 0x00000070AA9CC438>
# 1
# 2
# ['3', '4', '5', '6']

Guess you like

Origin www.cnblogs.com/jincoco/p/11210760.html