Speak reptile core foundation of the second (regular expressions)

Regular Expressions

Introduction to Regular Expressions

Regular expression is a string for a logical operation of the formula, it is to use pre-defined characters, and some combinations of certain of these particular characters to form a regular string. This string is used to express a logic string of a filter.

import re
pattern = 'python'
str = 'python and java'
# 第一个参数pattern 正则表达式 模板
# 第二个参数str 表示要匹配的字符串
# 第三个参数 标志位 匹配方式
result = re.match(pattern,str)
if result:
    print(result.group())
    print(result.start())
    print(result.end())
    print(result.span())
    print(result.string)
else:
    print('没有匹配')

Matches any single character

Matches any single character

1. Matches any addition of any one newline
2 [] matches the character [] recited
3 \ d match numbers 0-9
. 4 \ D non-matching number
5 \ s matches a tab blank space
6 \ S matches non-blank
7 \ w matches word characters AZ 0-9 _ az
8 \ W matches non-word character,

# 匹配单个字符
# 1.匹配任意除了换行的任意1个字符
# 2 [] 匹配[]中列举的字符
# 3.\d 匹配数字 0-9
# 4 \D 匹配非数字
# 5 \s 匹配空白 空格 tab键
# 6 \S 匹配非空白
# 7 \w 匹配单词字符 a-z A-Z 0-9 _
# 8 \W 匹配非单词字符、

def fn(ptn, lst):
    for x in lst:
        result = re.match(ptn, x)
        # 尝试在字符串的开头应用该模式,返回
        # 匹配对象,如果没有找到匹配,则为None。

        if result:
            print(x, '匹配成功', '匹配的结果是:', result.group())
        else:
            print('匹配失败')


# 1 .匹配除了换行的任意1个字符
# ptn = 'ab.'
# lst = ['abc1', 'ab', 'aba', 'abbcd', 'wwww','aab']

# # 2 [] 匹配[]中列举的字符
# ptn = 'm[abcd]n'
# lst = ['man','mbn','mdn','mon','nba']

# # 3 \d 匹配数字 0-9
# ptn = 'py\d'
# lst = ['py3','py4','name','pyxxxx']

# 4 \D 匹配非数字
# ptn = 'py\D'
# lst = ['pyz','py4','name','pyhhh']

# 5 \s 匹配空白 空格 tab键
# ptn = 'hello\sworld'
# lst =['hello world','helloxxx','helloworld']

# 6 \S 匹配非空白
# ptn = 'hello\Sworld'
# lst =['hello world','helloxxx','hello,world']

# 7 \w 匹配单词字符 a-z A-Z 0-9 _
# ptn = '\w-age'
# lst = ['1-age','a-age','!-age','_-age']

# 8 \W 匹配非单词字符、
ptn = '\W-age'
lst = ['1-age','a-age','#-age','_-age']

fn(ptn, lst)

Matching multiple characters

Matching multiple characters

A zero or infinite character 1 * appear before the match. Optional
2 + 1 before the match or unlimited character appears. You must have at least one
3? A character appear more than once or zero times before the match. 1 either have or not
4 {m} matches a character appears before the m times
5 {m, n} m to a character appears from n times before the match
6 {m,} plus, before a character appears to unlimited m

# * 匹配前一个字符出现0次或者无限次。可有可无
# ptn = 'h[a-z]*' # 匹配数据是h打头a-z  可以是0也可以是多个或者无限
# lst = ['hello','abc','xxx','h']

# + 匹配前一个字符出现1次或者无限次。至少得有一次
# ptn = 'h[a-z]+' # 匹配数据是h打头a-z  可以是0也可以是多个或者无限
# lst = ['hello','abc','xxx','h']

# ? 匹配前一个字符出现1次或者0次。要么有1次要么没有
# ptn = 'h[a-z]?' # 匹配数据是h打头a-z  可以是0也可以是多个或者无限
# lst = ['hello','abc','xxx','h']

# {m} 匹配前一个字符出现m次
# ptn = '[\w]{6}'
# lst = ['hell','python','#$%^%','12_456']

# {m,n} 匹配前一个字符出现从m到n次
# ptn = '[\w]{3,7}'
# lst = ['ab','python','#$%^%','_yyy123','12_456789']

# {m,} 加上,前一个字符出现m到无限次
ptn = '[\w]{3,}'  # 我要匹配的数据2以上的字符或者是数字
lst = ['ab','python','%^&&^&','_yyy123','123456']
# ptn = '.*?'
# lst = ['e&*^*^*FS']
fn(ptn, lst)

End of the match and packet

End of the match and packet

1 ^ start of the string
end of the matching string & 2

Matching array

1 | expression matches any of about
2 (ab) the characters in brackets as a packet
. 3 \ num num packets matched string reference

# 匹配结尾和分组
# ^ 匹配字符串开头
# & 匹配字符串结尾
# 匹配数字或者字母开头的邮箱
# ptn = '[\w][email protected]$'
# lst = ['[email protected]','[email protected]','[email protected]']

# 匹配数组
# | 匹配左右任意一个表达式
# (ab) 将括号中的字符作为一个分组
# \num 引用分组num匹配到的字符串

# | 匹配左右任意一个表达式
# ptn = 'hello|hc' # 匹配hello或者是hc的字符
# lst = ['hello','hc','ok','hello world']

# (ab) 将括号中的字符作为一个分组
# 检查一个以134 或者 135开头的手机号
# ptn ='(134|135)[0-9]{8}'
# lst = ['13445646518','13549848951','1356486484']

# ptn = '(010-)[0-9]{8}'
# ptn = '([^-]*)-(\d+)'
# lst = ['010-12345678']

# \num 引用分组num匹配到的字符串
ptn = r'<([a-zA-Z]{1,12})>\w*</\1>'
lst = ['<title>hello</title>','<body>bbb</body>','<a>python</b>']

fn(ptn, lst)

Regular common method

match () and search () method

match() 方法 只能匹配开始
search() 方法 可以匹配任意位置

ptn = 'com'
s = 'www.baidu.com'

ptn = r'\d+' # 匹配任意的数字
s = '当前文章的阅读次数:100次'

# result = re.match(ptn,s)
result = re.search(ptn,s)
print(result.group())

# if result:
#     print('匹配成功',result.group())
# else:
#     print('匹配失败')

findall () method

# findall() 匹配多个结果
ptn = r'\d+'  # 匹配任意的数字
s = '当前文章的阅读次数:100次,点赞次数10次'

# result = re.search(ptn,s)
result = re.findall(ptn,s)
print(result[0],result[1])

sub () method

# sub() 方法 将匹配到的数据进行替换
ptn = r'\d+'

s = '阅读次数:666'
def fn2(r):
    x = r.group()
    y = int(x) + 1
    return str(y)

r = re.sub(ptn,fn2,s)
print(r)

split () method

# split() 拆分字符串
# str.split() re.split()

s = 'name=jerry,age=30,[email protected]'
# print(s.split(','))

ptn = r'[=,@]'

r = re.split(ptn,s)
print(r)

Greed and non-greed

Greedy greedy matching and non-matching

# 贪婪匹配和非贪婪匹配

s = r'<div>abc</div><div>bcd</div>'

# 需求:<div>abc</div>

ptn = r'<div>.*</div>'
r = re.match(ptn,s)
print(r.group())
发布了30 篇原创文章 · 获赞 0 · 访问量 676

Guess you like

Origin blog.csdn.net/luobofengl/article/details/104465986