第一章:文本-re:正则表达式-模式语法(1)

13.4 模式语法
除了简单的字面量文本字符串,正则表达式还支持更强大的模式。模式可以重复,可以锚定到输入中不同的逻辑位置,可以紧凑的形式表述而不需要在模式中提供每一个字面量字符。可以结合字面量文本值和元字符来使用所有这些特性,元字符是re实现的正则表达式模式语法的一部分。

# re_test_patterns.py
import re

def test_patterns(text,patterns):
    """Given source text and a list of patterns,look for
    matches for each pattern within the text and print
    them to stdout.
    """

    # Look for each pattern in the text and print the results.
    for pattern,desc in patterns:
        print("'{}' ({})\n".format(pattern,desc))
        print(" '{}'".format(text))
        for match in re.finditer(pattern,text):
            s = match.start()
            e = match.end()
            substr = text[s:e]
            n_backslashes = text[:s].count('\\')
            prefix = '.' * (s + n_backslashes)
            print(" {}'{}'".format(prefix,substr))
        print()
    return

if __name__ == '__main__':
    test_patterns('abbaaabbbbaaaaa',[('ab',"'a' followed by 'b'")])

1.3.4.1 重复
模式中有5种表示重复的方法。模式后面如果有元字符*,则表示重复0次或多次(允许一个模式重复0次是指这个模式即使不出现也可以匹配)。如果把*换为+,那么模式必须至少出现1次才能匹配。使用?表示模式出现0次或1次。如果要指定出现次数,需要在模式后面使用{m},这里m是模式应重复的次数。最后,如果要允许一个可变但有限的重复次数,那么可以使用{m,n},这里m是最小重复次数,n是最大重复次数。如果省略n({m,}),则表示值必须至少出现m次,但没有最大限制。

from re_test_patterns import test_patterns

test_patterns(
    'abbaabbba',
    [('ab*', 'a followed by zero or more b'),
      ('ab+', 'a followed by one or more b'),
      ('ab?', 'a followed by zero or one b'),
      ('ab{3}', 'a followed by three b'),
      ('ab{2,3}', 'a followed by two or three b')
      ],
    )

运行结果:

‘ab*’ (a followed by zero or more b)

‘abbaabbba’
‘abb’
…‘a’
…‘abbb’
…‘a’

‘ab+’ (a followed by one or more b)

‘abbaabbba’
‘abb’
…‘abbb’

‘ab?’ (a followed by zero or one b)

‘abbaabbba’
‘ab’
…‘a’
…‘ab’
…‘a’

‘ab{3}’ (a followed by three b)

‘abbaabbba’
…‘abbb’

扫描二维码关注公众号,回复: 5085150 查看本文章

‘ab{2,3}’ (a followed by two or three b)

‘abbaabbba’
‘abb’
…‘abbb’

处理重复指令时,re在匹配模式时通常会尽可能多地消费输入。这种所谓的“贪心”行为可能会导致单个匹配减少,或者匹配结果可能包含比预想更多的输入文本。可以在重复指令后面加?来关闭这种贪心行为。

from re_test_patterns import test_patterns

test_patterns(
    'abbaabbba',
    [('ab*?','a followed by zero or more b'),
     ('ab+?','a followed by one or more b'),
     ('ab??','a followed by zero or one b'),
     ('ab{3}?','a followed by three b'),
     ('ab{2,3}?','a followed by two to three b')
        ],
    )

对于允许b出现0次的模式,如果消费输入时禁用贪心行为,那么这意味着匹配的子串不会包含任何b字符。
运行结果:

‘ab*?’ (a followed by zero or more b)

‘abbaabbba’
‘a’
…‘a’
…‘a’
…‘a’

‘ab+?’ (a followed by one or more b)

‘abbaabbba’
‘ab’
…‘ab’

‘ab??’ (a followed by zero or one b)

‘abbaabbba’
‘a’
…‘a’
…‘a’
…‘a’

‘ab{3}?’ (a followed by three b)

‘abbaabbba’
…‘abbb’

‘ab{2,3}?’ (a followed by two to three b)

‘abbaabbba’
‘abb’
…‘abb’

猜你喜欢

转载自blog.csdn.net/weixin_43193719/article/details/86663364