day --- 14 re module with regular expressions

Regular Expressions

re module with a regular relationship between the expression

  Regular expressions are not unique to it python is an independent technology \

  All programming languages ​​can use regular

  But if you want to use in python, it must rely on re module

    Regular screening is the string of specific content

Regular application scenarios

  1 Reptile

       2 Data Analysis

    Note: the beginning of the general re both regular and relationship

Character group [character set]
in Karma kind of character set may appear in the same position in the regular expression with [] indicate, of course, can be written in this position directly above 0,1,2,2,3,4 , 5,6,7,8,9 this 10 digit

 

              Metacharacters                    Matched content
              .                                        Match any character except newline unexpected
              \w                         Matching numbers, letters, underscores
              \d                         Matching numbers
              \s                         Matches any whitespace
             \W                       Non-matching numbers, letters, underscores
             \D                         Matching non-numeric
               \S                             Matching non-whitespace characters
             \n                           Matches a newline
             \t                         A matching tab
              \b                            Match the end of a word
              ^                                    Matches the beginning of string
              $                          End of the string
             a|b                    A matching character or character B (the length in front)
              ()                     Matching regular expression in parentheses denote a group
             [...]                       Matching string of characters
                 [^...]                                   Matches all characters in the string except

^ And $ characters will be used in conjunction with both the precise content restrictions string matching what to write in the middle of the match must be nothing more than a few did not want a not OK

              Regular                 Description
               [0-9]           This range represents 0,1,2,3,4,5,6,7,8,9
               [a-z]               Lowercase letters represent 26
               [A-Z]               Represents capitalized 26
            [0-9a-zA-Z]          26 represents a 0-9 lowercase letters capitalized 26

 

          quantifier                   Usage Notes
           *                   Repeated zero or more times
           +                   Repeated one or more times
           ?                   0 or 1 is repeated
          {n}                   N times
           {n,}                  Repeat n or more times
          {n,m}                  重复n到m次
           <.*>            默认是贪婪匹配,尽可能的匹配长的字符串
           <.*?>        加上了?   从贪婪匹配转为非贪婪匹配,就是尽可能短的匹配字符串

 

 

贪婪匹配:尽量的去多个值  在量词中,他们都是取贪婪匹配     默认情况下,采用贪婪匹配

 

 

非贪婪匹配       加?     贪婪变非贪婪

*?   重复的任意的次数,尽可能少重复

+?  重复1次或多次   尽可能少重复

??  重复0次或1次  尽可能少重复

{n,m}? 重复n到m次  尽可能少重复

{n,}?  重复n次以上   尽可能少重复

.*?的用法

.  是任意字符

*  是0到无线长度取值

?  是非贪婪匹配

三个合在一起就是取尽量少的任意字符     

  应用场景      .

    *?x      表示的意思是前面取任意长度的字符,知道末尾有一个x出现

 

re模块

findall    返回所有满足要求的结果   放在一个列表中
s = '0123456789'
print(re.findall('1',s))  # ['1']
print(re.findall('[0-3]',s))  # ['0', '1', '2', '3']
print(re.findall('[0-9]',s))  # ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
#
print(re.findall('asd',s))  # []  查找的字符没有的话就返回空列表
#
s1 = 'hello my name is james'
print(re.findall('[a-h]',s1))  # ['h', 'e', 'a', 'e', 'a', 'e']
print(re.findall('[a]',s1))  # ['a', 'a']
#
print(re.findall('[123]',s1))  # []  没有返回空列表
findall
search 也是查找字符,但是search是再找到第一个符合要求的字符后就不找了   可以是多个连接的字符
使用.group会找到
s = 'egon owen mac'
print(re.search('o',s))  # <_sre.SRE_Match object; span=(2, 3), match='o'>   返回一个这样的东西,说明search不能直接返回值使用.group
print(re.search('k',s))  # None  不实用group方法没找到会返回None
print(re.search('o',s).group())  # o
print(re.search('ego',s).group())  # ego
# print(re.search('k',s).group())  # 找不到 的会报错
search
match 和 search一样 也是查找元素在不在字符串中,使用group方法会返回值,但是不同的一点是他是检查是不是以什么开头的  如果不是直接报错
s = 'egon owen mac'
print(re.match('k',s))  # None  没有匹配到就返回None
print(re.match('e',s).group())  # e 使用group方法匹配成功就返回输入的字符,没找到就报错
# print(re.match('k',s).group())  # 没有匹配到  报错
match
split    切割不过是一次一次切     []里面的字符如果是都在字符串开头  那么就都打印出空字符串
s = 'asadfaghjkl'
print(re.split('a',s))  # ['', 's', 'df', 'ghjkl']  切一个字符    如果被切字符在开头第一个,
                          # 会输出一个空字符串,中间的不打印,但是也没有了,逗号隔开
print(re.split('k',s))  # ['asadfaghj', 'l']   不是开头字符直接被切用逗号隔开
print(re.split('[asa]',s))  # ['', '', '', 'df', 'ghjkl']    被切列表中有几个字符,就相当于是切了几次
                              #先按照a切在用剩下的和s切在用剩下的和a再切  都已空字符串打印
print(re.split('[fag]',s))  # ['', 's', 'd', '', '', 'hjkl']
split
subn 也是替换  不过是输出结果的元组,后面的参数是修改了的个数
s = 'aaassaasdfaaghj'
print(re.subn('\w','3',s))  # ('333333333333333', 15)
print(re.subn('a','3',s))  # ('333ss33sdf33ghj', 7)
subn

 

了解

obj = re.compile('\d{3}')  #将正则表达式编译成为一个 正则表达式对象,规则要匹配的是3个数字
ret = obj.search('abc123eeee') #正则表达式对象调用search,参数为待匹配的字符串
print(ret.group())  #结果 : 123
#
ret = re.finditer('\d', 'ds3sy4784a')   #finditer返回一个存放匹配结果的迭代器
print(ret)  # <callable_iterator object at 0x10195f940>
print(next(ret).group())  #查看第一个结果
print(next(ret).group())  #查看第二个结果
print([i.group() for i in ret])  #查看剩余的左右结果
View Code

 

Guess you like

Origin www.cnblogs.com/xuzhaolong/p/11203269.html