python notes 3- combing regular expression (continually updated)

1, format

  • About Escape characters:
    backslash character \ as the Escape character, so that does not handle special characters lose meaning backslash in any special way in a string literal with 'r' as prefix. R & lt therefore "\ n" comprising \ two-character string and 'n', and "\ n" is a single-character string containing newline
  • Conclusion: python regular expression r such recommendation written in the form of "str"

2, some special symbols (Update):

- ^ 匹配开头
- $ 匹配结尾
- * + 贪婪匹配
- *? +? 非贪婪(最小方式)匹配
- [] 里面的特殊字符失效,表示本身,如果[]中的第一个字符是 '^' ,则不匹配的所有字符将被匹配
- \s 空白字符
复制代码

3, regular expression object:

  • pattern
    we want the rules to become a regular expression string, the string will need to compile what became an expression object, which is also called pattern, that is style.
regex = r'具体规则'
pattern= re.compile(regex,flags=0)
复制代码

4, regular expression object (Pattern) supported methods:

  • pattern.search (string [, pos [,
    endpos]]) scan to find the first character string matches the pattern portion, and returns the corresponding matching objects (Match). Pos from the specified start search
  • pattern.match (string [, pos [,
    endpos]]) If zero or more characters beginning with this character string matching the regular expression, the corresponding matching target returns (Match). If the string does not match mode, None is returned
  • pattern.fullmatch (string [, pos [,
    endpos]]) above, but requires all match. (Python3 only of !!! python2 not supported)
  • pattern.split (string, maxsplit = 0)
    returns a list of the split according to the split regex (maxsplit split up occurs)
regex = r'[0-9]'
pattern = re.compile(regex)
pattern.split('5sdf2ad1adf9adf0')
# ['', 'sdf', 'ad', 'adf', 'adf', '']

regex = r'[0-9]'
pattern = re.compile(regex)
pattern.split('5sdf2ad1adf9adf0',2)
# ['', 'sdf', 'ad1adf9adf0']
复制代码
  • pattern.findall (string [, pos [,
    endpos]]) string from left to right scanning, returns a list of matching the order found
regex = r'[0-9]+'
pattern = re.compile(regex)
pattern.findall('jh22nf43ns43ho7ah94')
# ['22', '43', '43', '7', '94']
复制代码

The case of a packet: return only content within the group

regex = r'[0-9]+([0-9])'
pattern = re.compile(regex)
pattern.findall('jh22nf43ns43ho7ah94')
# ['2', '3', '3', '4']
复制代码

The case divided into two groups: returns a list, each element is a tuple

regex = r'[0-9]*([0-9])([0-9])'
pattern = re.compile(regex)
pattern.findall('jh22nf43ns43ho7ah94')
# [('2', '2'), ('4', '3'), ('4', '3'), ('9', '4')]
复制代码
  • pattern.sub (repl, string, COUNT = 0)
    the replace function to strengthen, in line with the rules of the string section in first find by regular expression rules, and then replaced by repl. Returns a string
regex = r'\d{3}'
pattern = re.compile(regex)
pattern.sub('OOO','87has100bmn010vzx686v')
# '87hasOOObmnOOOvzxOOOv'
复制代码

The case of a packet: repl group can be cited, preferably written r '...' form

regex = r'\d{2}(\d)'
pattern = re.compile(regex)
pattern.sub(r'\1','87has103bmn010vzx686v')
# '87has3bmn0vzx6v'
复制代码

(Repl may also be a function)

5, regular expression object attributes:

pattern.flags : pattern = re.compile(r'd',flags=re.IGNORECASE) #忽略大小写
复制代码

......

6, re module child's own function:

The above steps are normal, but provided directly re search, findall, match other shortcuts

7, Match Object match result object

pattern.search () and match () method returns a matching object, which in addition to packaged substring, it also provides other functions, because we have to group the rules by capturing parentheses when writing rules, therefore, we can also be matched to the packet extracted. As the match () and search () returns None when not matching, so you can match the success of a simple if statement tests

regex = r'hello'
pattern.search('myhelloworld')
match = pattern.search('helloworld')
if match:
    print("ok") # ok
复制代码

Match method of matching objects is as follows:

match.group([...])
regex = r'[0-9]+([0-9])'
pattern = re.compile(regex)
match = pattern.search('jh23nf43ns43ho7ah94')
if match:
    print(True)
    g_0 = match.group(0) # 匹配所有
    print(g_0) # 23
    g_1 = match.group(1) # 匹配第一个括号
    print(g_1) # 3
    g_0_1 = match.group(0,1) # 返回所有匹配的的元组
    print(g_0_1) ('23', '3')

match.groups() 返回一个包含匹配所有子组的元组,从1开始
regex = r'[0-9]+([0-9])'
pattern = re.compile(regex)
match = pattern.search('jh23nf43ns43ho7ah94')
if match:
    print(match.groups()) # ('3',)
复制代码

8, examples added:

......

reference:

www.jianshu.com/p/147fab022…
lier.space/2018/04/29/…

Guess you like

Origin blog.csdn.net/weixin_34072159/article/details/91399071