python regular expression introduced in

Use the re module

re.match scratch compliance rule matching a string matching starts from the starting position, returns an object successfully matched, unmatched successful return None.

import re
# re.match(正则表达式,要匹配的字符串)
result = re.match("hello","helloworld")
print(result) # <re.Match object; span=(0, 5), match='hello'>
# 可以使用group方法来提取数据
print(result.group()) # hello

Or write

import re
# 这样避免多次定义正则表达式
result = re.compile("hello")
result .match("helloworld")
print(result.group())

The character

character Features
. Matches any character (except \ n)
[ ] Matching character [] listed in
\d Matching numbers, that is, 0-9
\D Matching non-digital, that is not a number
\s Matching blank, space, tab key
\S Matching non-blank
\w Matching word character, namely 0-9, az, AZ
\W Matches non-word character

It represents the number

character Features
* A character appears zero or infinity times before the match, that is dispensable
+ A character appear more than once or unlimited times before the match, that is, at least once
? Matches the preceding character 0 or 1 occurrence times, that there is either 1 or none
{m} M times a character appears before match
{m,} A character at least m appeared before match
{m,n} Before a matching character appears from m to n times

Boundary representation

character Features
^ Matches the beginning of string
$ End of the string
\b Matches a word boundary
\B Matching non-word boundary

Packet Matching

character Features
| About a match in any expression
(from) The characters in brackets as a group
\on one Num reference packet matching the string
(?P<name>) To a packet from an alias
(?P=name) Reference packet alias name is matched to a string

Greed and non-greed

python in default quantifier is greedy, always try to match as many characters. Non-greedy on the contrary, always try as few characters.
in*,? , +, {M, n} followed by? The greed become non-greedy.

import re
re.match(r"aa(\d+)","aa2343ddd").group() # 贪婪模式,匹配aa2343
re.match(r"aa(\d+?)","aa2343ddd").group() # 非贪婪模式,匹配aa2

Other uses of the re module

search (regular expression, string to match)

  • Browse all string, string matching the first line with the rules, browse the entire string to match the first one, did not match the successful return None.
  • match () function and search () function is basically the same function, that is not the same as match () comply with the rules of a string matches the beginning of the string, search () is a close match the first rule in a global string string.

findall (regular expression, string to match)

  • Browse all string matching string together all the rules, to match string into a list, did not match the success of an empty list.

sub (a regular expression, to replace the string, the string to be matched)

  • Alternatively successfully matched string specified position

Match mode

By modifying the re.compile () second parameter flag, re.match, re.search, the fourth parameter and the third parameter flag re.sub re.findall be modified to match the pattern. (Generally small default can be used, unless there are special requirements)

Modifiers description
re.I The match is not case sensitive
re.L Do identify the localization (locale-aware) Match
re.M Multi-line matching, affecting ^ and $
re.S Make. All matches including newline characters, including
re.U According to parse character Unicode character set. This flag affect \ w, \ W, \ b, \ B.
re.X This flag by giving you more flexibility in format so that you will write regular expressions easier to understand
Published 44 original articles · won praise 8 · views 2462

Guess you like

Origin blog.csdn.net/qq_39659278/article/details/100054954