Analysis of the python re module

re grouping function

python re module has a grouping function. The so-called grouping is to have been matched to the content which then need to filter out content, the equivalent of secondary filtration.

Packet achieved by parentheses (), is obtained by the contents of the packet group (), groups (), groupdict () method.

re module in several important ways on the packet, there are different forms, require separate treatment.

re examples

match () method

When the packet is not the case:

import re

origin = "hasdfi123123safd"
# 不分组时的情况
r = re.match("h\w+", origin)
print(r.group())         # 获取匹配到的整体结果
print(r.groups())        # 获取模型中匹配到的分组结果元组
print(r.groupdict())     # 获取模型中匹配到的分组中所有key的字典

结果:
hasdfi123123safd
()
{}

There are circumstances packet (note the parentheses!)

import re

origin = "hasdfi123123safd123"
# 有分组
r = re.match("h(\w+).*(?P<name>\d)$", origin)
print(r.group())  # 获取匹配到的整体结果
print(r.group(1))  # 获取匹配到的分组1的结果
print(r.group(2))  # 获取匹配到的分组2的结果
print(r.groups())  # 获取模型中匹配到的分组结果元组
print(r.groupdict())  # 获取模型中匹配到的分组中所有key的字典

执行结果:
hasdfi123123safd123
asdfi123123safd12
3
('asdfi123123safd12', '3')
{'name': '3'}

Description ⚠️:

  • (1) Regular expressions h(\w+).*(?P&lt;name&gt;\d)$have two small brackets to indicate that it divided the two teams when the match is to take the whole expression to match, rather than take the team to match.

  • (2) (\w+)indicates that the group is 1 to a plurality of alphanumeric characters, characters corresponding to match any word comprises underlined. It is equivalent to '[A-Za-z0-9_]'.

  • (3) (?P&lt;name&gt;\d)in ?P&lt;name&gt;a regular expression special syntax to represent the group took a called "name" name, ?P&lt;xxxx&gt;is fixed wording. \dMatches a digit character. It is equivalent to [0-9].

  • (4) when the value of the acquired packet, Group () and group (0) is like, stands for the entire matched string, starting from the group (1), respectively, from left to right is the group number, press position order.

search () method

There are circumstances grouping:

import re

origin = "sdfi1ha23123safd123"      # 注意这里对匹配对象做了下调整
# 有分组
r = re.search("h(\w+).*(?P<name>\d)$", origin)
print(r.group())  
print(r.group(0))  
print(r.group(1))  
print(r.group(2))
print(r.groups())  
print(r.groupdict()) 

执行结果:
ha23123safd123
ha23123safd123
a23123safd12
3
('a23123safd12', '3')
{'name': '3'}

Description ⚠️: behave and match () method is basically the same.

match () method to search () method distinction

re.match matches only the beginning of the string, if the string does not conform to begin regular expression, the match fails, the function returns None; and re.search match the entire string until it finds a match.

For example as follows:

#!/usr/bin/python
import re

line = "Cats are smarter than dogs";

matchObj = re.match( r'dogs', line, re.M|re.I)
if matchObj:
   print "match --> matchObj.group() : ", matchObj.group()
else:
   print "No match!!"

matchObj = re.search( r'dogs', line, re.M|re.I)
if matchObj:
   print "search --> matchObj.group() : ", matchObj.group()
else:
   print "No match!!"

Code executes the above results are as follows:

No match!!
search --> matchObj.group() :  dogs

Spread

Regular expressions example:

#!/usr/bin/python
import re
line = "Cats are smarter than dogs"
matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)
if matchObj:
    print "matchObj.group() : ", matchObj.group()
    print "matchObj.group(1) : ", matchObj.group(1)
    print "matchObj.group(2) : ", matchObj.group(2)
else:
    print "No match!!"

Description ⚠️: About Regular Expressionsr'(.*) are (.*?) .*'

  • (1) First of all, this is a string, a string representing a non-front r escaped the original string, let the compiler ignore the backslash escape character is ignored. But this is not a backslash in the string, so this r dispensable.

  • (2) (. ) The first matching packet. Representative matches all characters except a newline.

  • (3) (. ?) The second group match. ? Multiple question marks behind, representatives of non-greedy mode, which means that only the minimum qualifying match characters.

  • (4) behind a * it is not enclosed in parentheses, so it is not a packet, and a matching effect as, but not included in the matching result.

  • (5) matchObj.group () is equivalent to matchObj.group (0), which matches to the full text characters

    matchObj.group (1) to obtain a first set of matches, i.e. (. *) is matched to

    matchObj.group (2) to obtain the second set of matches, which is (. *?) to match

    Because only the matching results in only two groups, it will complain if fill 3.

Reference Documents

Guess you like

Origin blog.51cto.com/wutengfei/2430182