regular expressions python introduction

Use the re module

re.match scratch compliance rule matching a string matching starts from the starting position, returns an object successfully matched, unmatched successful return None.

import re
# re.match(正则表达式,要匹配的字符串)
result = re.match("hello","helloworld")
print(result) # <re.Match object; span=(0, 5), match='hello'>
# 可以使用group方法来提取数据
print(result.group()) # hello

Or write

import re
# 这样避免多次定义正则表达式
result = re.compile("hello")
result .match("helloworld")
print(result.group())

The character

character Features
. Matches any character (except \ n)
[ ] Matching character [] listed in
\d Matching numbers, that is, 0-9
\D Matching non-digital, that is not a number
\s Matching blank, space, tab key
\S Matching non-blank
\w Matching word character, namely 0-9, az, AZ
\W Matches non-word character

It represents the number

character Features
* A character appears zero or infinity times before the match, that is dispensable
+ A character appear more than once or unlimited times before the match, that is, at least once
? Matches the preceding character 0 or 1 occurrence times, that there is either 1 or none
{m} M times a character appears before match
{m,} A character at least m appeared before match
{m,n} Before a matching character appears from m to n times

Boundary representation

character Features
^ Matches the beginning of string
$ End of the string
\b Matches a word boundary
\B Matching non-word boundary

Packet Matching

character Features
| About a match in any expression
(from) The characters in brackets as a group
\on one Num reference packet matching the string
(?P<name>) To a packet from an alias
(?P=name) Reference packet alias name is matched to a string

Greed and non-greed

python in default quantifier is greedy, always try to match as many characters. Non-greedy on the contrary, always try as few characters.
in*,? , +, {M, n} followed by? The greed become non-greedy.

import re
re.match(r"aa(\d+)","aa2343ddd").group() # 贪婪模式,匹配aa2343
re.match(r"aa(\d+?)","aa2343ddd").group() # 非贪婪模式,匹配aa2

Other uses of the re module

search (regular expression, string to match)

  • Browse all string, string matching the first line with the rules, browse the entire string to match the first one, did not match the successful return None.
  • match () function and search () function is basically the same function, that is not the same as match () comply with the rules of a string matches the beginning of the string, search () is a close match the first rule in a global string string.

findall (regular expression, string to match)

  • Browse all string matching string together all the rules, to match string into a list, did not match the success of an empty list.

sub (a regular expression, to replace the string, the string to be matched)

  • Alternatively successfully matched string specified position

Match mode

By modifying the re.compile () second parameter flag, re.match, re.search, the fourth parameter and the third parameter flag re.sub re.findall be modified to match the pattern. (Not much use general default can, unless there are special requirements)
| modifier | Description |
| - | - |
| re.I | the matches are not case sensitive |
| re.L | doing localization Recognition (locale-aware) matching |
| re.M | multi-line matching, affecting ^ and $ |
| re.S | the matches all characters, including newline including |.
| re.U | Unicode character set based on analytical character. This flag affect \ w, \ W, \ b, \ B. |
| re.X | the mark by giving you more flexibility in format so that you will write regular expressions easier to understand |

Guess you like

Origin www.cnblogs.com/lxy0/p/11407150.html