2019-7-17 regular expressions and re module

A, re module and the relationship between regular expressions

  Python regular expression is not unique, it is an independent technology

  All programming languages ​​can use regular

  But if you want to use in python, you must rely on re module

Regular official definition: A regular expression is a logical formula of string operations, it is to use pre-defined number of specific characters, and combinations of these particular characters to form a 'string rule', which a 'regular character string 'logic used to express a filter string.

In simple terms: regular string is used to filter specific content. If the value of a position change does not occur, you do not need to use regular, we need to consider all possible characters may appear in the same position.

Second, the regular expression

  1. Character Group: [burst]

    In various characters may appear in the same position to form a character group, said in a regular expression with []

    Character divided into many categories, such as numbers, letters, punctuation, etc.

    Expression is a string inside or relationship [0-9x]: 0-9 or a digital representation of the letter x

  2. Character (metacharacters): these characters include the following element, when the frame memory set up  

    Metacharacters need a few things to note:

    1. ^ and the content will be used in conjunction with $ exact match to limit, in the middle of both what to write, what is the string matching, no more and no less a a

    2.abc | ab matches the string length of time must first be placed in front of

      3. ^ write directly on the outside, it is to limit the beginning of the string. $ String ending restrictions

    4. [^] said that apart from [] inside the character, other characters have 

    5. Packet () when a plurality of regular symbols repeated as many times as a whole or when other operations, it may be in the form of packets

      

 

  3. quantifier

   

    Quantifier points to note:

    1. quantifiers must follow the sign of being behind, next to the one of its regular symbol (remember that it is only possible to limit the next) can only quantifier restrictions

1.#正则表达式    
海.*
#需要筛选的字符
海燕啊海娇海东
#结果
海燕啊海娇海东    (.代表除了换行符之外的所有字符,*是重复多次,所以把所有结果都取出来了,贪婪模式)

2.#正则表达式    
海.*?
#需要筛选的字符
海燕啊海娇海东
#结果
海  
海
海    (只匹配出三个海字,*?表示查找0次,作用于紧挨着的. 所以这个. 就当没有匹配)

    2.正则在匹配的时候默认是贪婪匹配(尽可能多的匹配),你可以在量词后面加一个?就可以变成非贪婪匹配

     *?表示匹配0次,+?表示匹配1次,??表示匹配0次

 

例题:

. ^ $

 

* + ? {}

字符集   []  [^]     ([^..]也算一个占位)

 分组()    或 |

 

转义符   \

 

贪婪匹配和非贪婪匹配

三、re模块的用法

  三个需要掌握的方法

   re.findall   (找出字符串中符合正则表达式全部内容,并且返回的是一个列表,列表中的元素就是正则匹配的结果)

import re

ret = re.findall('a', 'eva egon yuan')  # 返回所有满足匹配条件的结果,放在列表里
print(ret)

#结果   ['a', 'a']

  re.search  (返回的是一个对象,必须要用group()调用才会返回结果,一个group()返回一个结果)

  注意:search只会根据正则查找一次,查到了结果就会停止查找,如果当查找结果不存在的情况下调用group会直接报错

import re
res = re.search('a','eva egon jason')
print(res)  # search不会给你直接返回匹配到的结果 而是给你返回一个对象
print(res.group())  # 必须调用group才能看到匹配到的结果

#结果   a     只会返回一个结果

  re.match    (match只会匹配字符串的开头部分,下面这个题是e开头,但是正则找的是a,就是返回一个None)

  注意:match只会匹配开头部分,找到了就返回一个对象然后用group获取,如果没找到就返回一个None

import re
res = re.match('a','eva egon jason')
print(res)   #None
print(res.group())   #报错

  其他方法

  re.split  (切割)

import re
ret = re.split('[ab]', 'abcd')  # 先按'a'分割得到''和'bcd',在对''和'bcd'分别按'b'分割
print(ret)  # ['', '', 'cd'] 返回的还是列表

  re.sub  (替换)

import re
ret = re.sub('\d', 'H', 'eva3egon4yuan4',1)  # 将数字替换成'H',参数1表示只替换1个
# sub('正则表达式','新的内容','待替换的字符串',n)
"""
先按照正则表达式查找所有符合该表达式的内容 统一替换成'新的内容'  还可以通过n来控制替换的个数
"""
print(ret)  # evaHegon4yuan4

  re.subn  (也是替换,但是返回结果不一样)

import re
ret = re.subn('\d', 'H', 'eva3egon4yuan4')  # 将数字替换成'H',返回元组(替换的结果,替换了多少次)
ret1 = re.subn('\d', 'H', 'eva3egon4yuan4',1)  # 将数字替换成'H',返回元组(替换的结果,替换了多少次)
print(ret)  # 返回的是一个元组 元组的第二个元素代表的是替换的个数

  re.compile   (把一个正则表达式编译成一个正则表达式对象)

import re
obj = re.compile('\d{3}')  #将正则表达式编译成为一个 正则表达式对象,规则要匹配的是3个数字
ret = obj.search('abc123eeee') #正则表达式对象调用search,参数为待匹配的字符串
res1 = obj.findall('347982734729349827384')
print(ret)  #结果是一个对象
print(ret.group())  #结果 : 123
print(res1)  #结果 : ['347', '982', '734', '729', '349', '827', '384']

  re.finditer   (返回一个迭代器,用next+group取值)

import re
ret = re.finditer('\d', 'ds3sy4784a')   #finditer返回一个存放匹配结果的迭代器
print(ret)  # <callable_iterator object at 0x10195f940>
print(next(ret).group())  #查看第一个结果
print(next(ret).group())  #查看第二个结果
print([i.group() for i in ret])  #查看剩余的左右结果

  分组优先机制:()在python  re 模块中的应用

import re
res = re.search('^[1-9]\d{14}(\d{2}[0-9x])?$','110105199812067023')
print(res.group())
print(res.group(1))  # 获取正则表达式括号阔起来分组的内容   获取第一个括号的结果
print(res.group(2))    #获取第二个括号的结果


# 而针对findall它没有group取值的方法,所以它默认就是分组优先获取的结果
ret = re.findall('www.(baidu|oldboy).com', 'www.oldboy.com')
print(ret) # ['oldboy'] 这是因为findall会优先把匹配结果组里内容返回,如果想要匹配结果,取消权限即可

 
 

ret = re.findall('www.(?:baidu|oldboy).com', 'www.oldboy.com') # ?:取消分组优先
print(ret) # ['www.oldboy.com']

 

split中()的用法

ret=re.split("\d+","eva3egon4yuan")
print(ret) #结果 : ['eva', 'egon', 'yuan']

ret1=re.split("(\d+)","eva3egon4yuan")
print(ret1) #结果 : ['eva', '3', 'egon', '4', 'yuan']

  给正则表达式取别名   (?P)

import re
res = re.search('^[1-9](?P<password>\d{14})(?P<username>\d{2}[0-9x])?$','110105199812067023')

print(res.group('password'))   #10105199812067

print(res.group('username'))   #023

 

Guess you like

Origin www.cnblogs.com/wangcuican/p/11203378.html