Python practical techniques of 26: Defining the shortest match of the regular expression

1, demand

We are trying to use regular expressions to do the matching text mode, but identified is the longest possible match. Instead, we want to modify it for the shortest possible match.

2. Solution

This problem usually occurs (e.g., quoted strings) when text is wrapped one pair of matching the start and end delimiters up, in order to illustrate the problem, consider the following example:

import re

str_pat=re.compile(r'\"(.*)\"')
text1='mark say "love"'
text2='mark say "love",jingjing say "yes"'
print(str_pat.findall(text1))
print(str_pat.findall(text2))

result:

['love']
['love",jingjing say "yes']
Python资源分享qun 784758214 ,内有安装包,PDF,学习视频,这里是Python学习者的聚集地,零基础,进阶,都欢迎

In this example, the pattern R & lt ' "(. )"' Attempt to match the text in quotes. However, the operator used in regular expressions are greedy strategy, so to find the matching process is based on the longest possible match carried out. Thus giving rise to the above [love ", jingjing say" yes] this match result.

To solve this problem, as long as the mode of the * operator followed by the? Modifier on it.

Example:

import re

str_pat=re.compile(r'\"(.*?)\"')
text1='mark say "love"'
text2='mark say "love",jingjing say "yes"'
print(str_pat.findall(text1))
print(str_pat.findall(text2))

result:

['love']
['love', 'yes']

Do not proceed so that the matching process in a greedy way, but also will produce the shortest match.

This section refers to a period when there is writing [.] Character of the regular expression problems often encountered. To solve the problem the longest match so that matching becomes shortest match, we need to add one or after + * [?].

Guess you like

Origin blog.51cto.com/14445003/2429838