第一章：文本-re:正则表达式-模式语法（3）

1.3.4.3 转义码
一种更简洁的表示是对一些预定义的字符集使用转义码。Re可识别的转义码如表1-1所示。
表1-1 正则表达式转义码

转义码	含义
\d	数字
\D	非数字
\s	空白符（制表符、空格、换行等）
\S	非空白符
\w	字母数字
\W	非字母数字

说明：可以在字符前加一个反斜线（\）来指示转义。遗憾的是，反斜线本身在正常的Python字符串中也必须转义，这就会带来很难读的表达式。通过使用原始（raw）字符串可以消除这个问题，要保证可读性，可以在字面值前加r前缀来创建原始字符串。

# re_test_patterns.py
import re

def test_patterns(text,patterns):
    """Given source text and a list of patterns,look for
    matches for each pattern within the text and print
    them to stdout.
    """

    # Look for each pattern in the text and print the results.
    for pattern,desc in patterns:
        print("'{}' ({})\n".format(pattern,desc))
        print(" '{}'".format(text))
        for match in re.finditer(pattern,text):
            s = match.start()
            e = match.end()
            substr = text[s:e]
            n_backslashes = text[:s].count('\\')
            prefix = '.' * (s + n_backslashes)
            print(" {}'{}'".format(prefix,substr))
        print()
    return

if __name__ == '__main__':
    test_patterns('abbaaabbbbaaaaa',[('ab',"'a' followed by 'b'")])

from re_test_patterns import test_patterns

test_patterns(
    'A prime #1 example!',
    [(r'\d+','sequence of digits'),
     (r'\D+','sequence of non-digits'),
     (r'\s+','sequence of whitespace'),
     (r'\S+','sequence of non-whitespace'),
     (r'\w+','alphanumeric characters'),
     (r'\W+','non-alphanumeric')
        ],
    )

运行结果：
在这里插入图片描述
匹配正则表达式语法中包含的字符，需要转义搜索模式中的字符。

from re_test_patterns import test_patterns

test_patterns(
    r'\d+ \D+ \s+',
    [(r'\\.\+','escape code')],
    )

这个例子中的模式对反斜线和加号字符进行转义，因为这两个字符都是元字符，在正则表达式中有特殊的含义。
运行结果：
在这里插入图片描述

第一章：文本-re:正则表达式-模式语法（3）

猜你喜欢