第一章：文本-re:正则表达式-模式语法（1）

13.4 模式语法
除了简单的字面量文本字符串，正则表达式还支持更强大的模式。模式可以重复，可以锚定到输入中不同的逻辑位置，可以紧凑的形式表述而不需要在模式中提供每一个字面量字符。可以结合字面量文本值和元字符来使用所有这些特性，元字符是re实现的正则表达式模式语法的一部分。

# re_test_patterns.py
import re

def test_patterns(text,patterns):
    """Given source text and a list of patterns,look for
    matches for each pattern within the text and print
    them to stdout.
    """

    # Look for each pattern in the text and print the results.
    for pattern,desc in patterns:
        print("'{}' ({})\n".format(pattern,desc))
        print(" '{}'".format(text))
        for match in re.finditer(pattern,text):
            s = match.start()
            e = match.end()
            substr = text[s:e]
            n_backslashes = text[:s].count('\\')
            prefix = '.' * (s + n_backslashes)
            print(" {}'{}'".format(prefix,substr))
        print()
    return

if __name__ == '__main__':
    test_patterns('abbaaabbbbaaaaa',[('ab',"'a' followed by 'b'")])

1.3.4.1 重复
模式中有5种表示重复的方法。模式后面如果有元字符*，则表示重复0次或多次（允许一个模式重复0次是指这个模式即使不出现也可以匹配）。如果把*换为+，那么模式必须至少出现1次才能匹配。使用?表示模式出现0次或1次。如果要指定出现次数，需要在模式后面使用{m}，这里m是模式应重复的次数。最后，如果要允许一个可变但有限的重复次数，那么可以使用{m,n},这里m是最小重复次数，n是最大重复次数。如果省略n({m,}),则表示值必须至少出现m次，但没有最大限制。

from re_test_patterns import test_patterns

test_patterns(
    'abbaabbba',
    [('ab*', 'a followed by zero or more b'),
      ('ab+', 'a followed by one or more b'),
      ('ab?', 'a followed by zero or one b'),
      ('ab{3}', 'a followed by three b'),
      ('ab{2,3}', 'a followed by two or three b')
      ],
    )

运行结果：

‘ab*’ (a followed by zero or more b)

‘abbaabbba’
‘abb’
…‘a’
…‘abbb’
…‘a’

‘ab+’ (a followed by one or more b)

‘abbaabbba’
‘abb’
…‘abbb’

‘ab?’ (a followed by zero or one b)

‘abbaabbba’
‘ab’
…‘a’
…‘ab’
…‘a’

‘ab{3}’ (a followed by three b)

‘abbaabbba’
…‘abbb’

扫描二维码关注公众号，回复： 5085150 查看本文章

‘ab{2,3}’ (a followed by two or three b)

‘abbaabbba’
‘abb’
…‘abbb’

处理重复指令时，re在匹配模式时通常会尽可能多地消费输入。这种所谓的“贪心”行为可能会导致单个匹配减少，或者匹配结果可能包含比预想更多的输入文本。可以在重复指令后面加?来关闭这种贪心行为。

from re_test_patterns import test_patterns

test_patterns(
    'abbaabbba',
    [('ab*?','a followed by zero or more b'),
     ('ab+?','a followed by one or more b'),
     ('ab??','a followed by zero or one b'),
     ('ab{3}?','a followed by three b'),
     ('ab{2,3}?','a followed by two to three b')
        ],
    )

对于允许b出现0次的模式，如果消费输入时禁用贪心行为，那么这意味着匹配的子串不会包含任何b字符。
运行结果：

‘ab*?’ (a followed by zero or more b)

‘abbaabbba’
‘a’
…‘a’
…‘a’
…‘a’

‘ab+?’ (a followed by one or more b)

‘abbaabbba’
‘ab’
…‘ab’

‘ab??’ (a followed by zero or one b)

‘abbaabbba’
‘a’
…‘a’
…‘a’
…‘a’

‘ab{3}?’ (a followed by three b)

‘abbaabbba’
…‘abbb’

‘ab{2,3}?’ (a followed by two to three b)

‘abbaabbba’
‘abb’
…‘abb’

第一章：文本-re:正则表达式-模式语法（1）

猜你喜欢