06 Python reptile of Re (regular expressions) library

Regular expressions are used for simplicity of expression of a set of strings expression

First, the regular expression syntax

1.1 Common regular expression operators

Operators Explanation Examples
. It represents any single character
[ ] Characters, a single character is given in the range of [Abc] represents a, b, c, [a-z] represents a single character to z
[^ ] Non characters, a single character is given to the negative range [^ Abc] represents a or b or c non-single character
* Previous character 0 times or an unlimited number of extensions abc * represents ab, abc, abcc, abccc etc.
+ 1 previous character or unlimited expansion It represents abc + abc, abcc, abccc etc.
? Previous character 0 or 1 extension abc? represents ab, abc
| Any expression about a abc def represents abc, def
{m} M times before the extended character ab {2} c represents abbc
{m,n} A front extension character m to n times (including n) represents ab {1,2} c abc, abbc
^ Matches the beginning of string ^ Abc abc and to indicate the beginning of a string
$ End of the string $ abc abc and represents the end of a string
( ) Packet marking, the operator can only use the internal (Abc) represents abc, (abc | def) represents abc, def
\d Number, is equivalent to [0-9]
\D Non-numeric
\w Word characters (number / letter / underscore), is equivalent to [A-Za-z0-9_]
\W Non-numeric / non-alphabetic / non-underlined
\s Space / \ t / \ n
\S Non-space / non \ t / non \ n

Example:

. Any character (except newline)

It represents any single character

import re
s= 'abc12ab56bc'
# .: 任意字符(换行符除外)
print(re.findall(".",s))

['a', 'd', 'a', 's', 'd', 'a', 's', 'd', 'a', 's', 'd', 'a', 's', 'f', 'a', 's', 'f', '\t']

[] Meta character (character set)

Intermediate character matching, as long as a single character

May also be used [az] z represents a single character to the

import re
s = "adasdasdasdasfasf\n\t"
# []: 匹配中间的字符,只要单个字符
print(re.findall("[acef]",s))

['a', 'a', 'a', 'a', 'a', 'f', 'a', 'f']

[^] Trans taken

^ Elements of [] in negation, in addition to [] in the character to be

import re
s = "adasdasdasdasfasf\n\t"
# [^] : 把[]中的字符给排除.
print(re.findall("[^acef]",s))

['d', 's', 'd', 's', 'd', 's', 'd', 's', 's', '\n', '\t']

* 0- infinite extension of time before a character

: Match the preceding character 0- infinite number of empty will match

import re
s = r"abaacaaaaa"
# *: 匹配 *前面的字符0-无穷个
print(re.findall("a*",s))     # 匹配 0-无限个a,空也会匹配

[ 'A', '', 'aa', '', 'aaaaa', '']

+ Versus 1 before a character infinitely extended

+: + Match infinite number of previous character 1-

import re
s = r"abaacaaaaa"
# +: 匹配 +前面的字符1-无穷个
print(re.findall("a+",s))     # 匹配 1-无限个a

[ 'A', 'aa', 'aaaaa']

? For the previous character 0 or 1 time extension

? : Match? Preceding character 0 or 1 time extension

import re
s = r"abaacaaaaa"
# ?: 匹配 ?前面的字符0-1个
print(re.findall("a?",s))     # 匹配 0-1个a

['a', '', 'a', 'a', '', 'a', 'a', 'a', 'a', 'a', '']

| Left and right sides of the character to be

A | B: A and B should be

import re
s = 'abacad'
# A|B: A和B都要
print(re.findall('a|b', s))

['a', 'b', 'a', 'a']

{M} m times to extend the previous character

{M}: Match the previous character {m} m times

import re
s = r"abaacaaaaa"
# {m}: 匹配 前面的字符m个
print(re.findall("a{2}",s))   # 匹配 2个a

[ 'Aa', 'aa', 'aa']

{M, n} for the previous character extension mn times (including n)

{M, n}: match the preceding character (mn)

import re
s = r"abaacaaaaa"
# {m,n}: 匹配 前面的的字符m-n个
print(re.findall("a{2,3}",s))   # 匹配 2、3个a

['aa', 'aaa', 'aa']

^ Metacharacter

And the beginning of the string matching rules in line to match or do not match

Matches the beginning of the string. Match the beginning of each line in multi-line mode (Python3 + has failed, with the use compile)

import re
s = '王大炮打炮被大炮打死了 王大炮打炮被大炮打死了'
# ^: 匹配开头
print(re.findall("^王大炮", s))

[ 'King Cannon']

$ Metacharacter

End of the string matching rules in line with the position to match or do not match

End of the string matching, matching end of each line in multi-line mode

import re
s = '王大炮打炮被大炮打死了 王大炮打炮被大炮打死了'
# $: 匹配结尾
print(re.findall("打死了$", s))

[ 'Killed']

() As long as the brackets

(): As long as the brackets

import re
s = 'abacad'
# (): 只要括号内的
print(re.findall('a(.)', s))

['b', 'c', 'd']

\ D matches a single digit (0-9)

\ D: matches a single number

import re
s = '1#@¥23abc123 \n_def\t456'
# \d: 匹配单个数字
print(re.findall("\d",s))   # 匹配 单个数字

['1', '2', '3', '1', '2', '3', '4', '5', '6']

\ D matches a single non-numeric (including \ n)

\ D: matches a single non-numeric

import re
s = '1#@¥23abc123 \n_def\t456'
# \D: 匹配单个非数字
print(re.findall("\D",s))   # 匹配 单个 非数字(包括\n)

['#', '@', '¥', 'a', 'b', 'c', ' ', '\n', '_', 'd', 'e', 'f', '\t']

\ W match number / letter / underscore

\ W: match number / letter / underscore

import re
s = '1#@¥23abc123 \n_def\t456'
# \w: 匹配 数字/字母/下划线
print(re.findall("\w",s))

['1', '2', '3', 'a', 'b', 'c', '1', '2', '3', '_', 'd', 'e', 'f', '4', '5', '6']

\ W matches non-numeric / non-alphabetic / non-underlined

\ W: non-numeric / non-alphabetic / non-underlined

import re
s = '1#@¥23abc123 \n_def\t456'
# \W: 非数字/非字母/非下划线
print(re.findall("\W",s))

['#', '@', '¥', ' ', '\n', '\t']

\ S matches a space / \ t / \ n

\ S: space / \ t / \ n

import re
s = '1#@¥23abc123 \n_def\t456'
# \s: 空格/ \t/ \n
print(re.findall("\s", s))

[' ', '\n', '\t']

\ S matches non-whitespace / non \ T / Non \ m

\ S: non-whitespace / non \ T / Non \ m

import re
s = '1#@¥23abc123 \n_def\t456'
# \S: 非空格/ 非\t/ 非\m
print(re.findall("\S", s))

['1', '#', '@', '¥', '2', '3', 'a', 'b', 'c', '1', '2', '3', '_', 'd', 'e', 'f', '4', '5', '6']

Second, the basic use of Re library

2.1 Re library introduction

Re Python standard library library is mainly used for string matching

** invocation:import re**

2.2 Re main function library function

function Explanation
re.search() In a search for a matching string in the first position of the regular expression, returns match object
re.match() Match the regular expression from the start position of a character string, returns match object
re.findall() The search string, return a list type can match substrings of all
re.split() The string is divided according to a regular expression matching result, returns a list of type
re.finditer () Search string, return a matching result of the iterative type, each element is a match object iteration
re.sub() String replaces all substring match the regular expression in a string, returns after replacement

re.search(pattern,string,flags=0)

In a search string matches the regular expression in the first position return match object

  • pattern: regular expression string or a string representation of native
  • string: be matched string
  • : flags control mark using regular expressions

re.match(pattern,string,flags=0)

From the start position of a character string matching the regular expression returns match object

  • pattern: regular expression string or a string representation of native
  • string: be matched string
  • : flags control mark using regular expressions

re.findall(pattern,string,flags=0)

The search string, return a list type can match substrings of all

  • pattern: regular expression string or a string representation of native
  • string: be matched string
  • : flags control mark using regular expressions

re.split(pattern,string,maxsplit=0,flags=0)

It will return a list type string divided according to a regular expression matching result

  • pattern: regular expression string or a string representation of native
  • string: be matched string
  • maxsplit: Large number of divisions, as a remaining portion of the output element
  • : flags control mark using regular expressions

re.finditer(pattern,string,flags=0)

Search string, return a matching result of the iterative type, each element is a match object iteration

  • pattern: regular expression string or a string representation of native
  • string: be matched string
  • : flags control mark using regular expressions

re.sub(pattern,repl, string,count=0,flags=0)

Replaces all matches the regular expression string sub-string in a string returned replacement

  • pattern: regular expression string or a string representation of native
  • repl: string replacement string matches
  • string: be matched string
  • count: the number of big matches replaced
  • : flags control mark using regular expressions

Another usage is equivalent to 2.3 Re library

regex = re.compile(pattern,flags=0)

Compiles a string into a regular expression regular expression object

  • pattern: regular expression string or a string representation of native
  • : flags control mark using regular expressions
function Explanation
regex.search() In a search for a matching string in the first position of the regular expression, returns match object
regex.match() Match the regular expression from the start position of a character string, returns match object
regex.findall() The search string, return a list type can match substrings of all
regex.split() The string is divided according to a regular expression matching result, returns a list of type
regex.finditer () Search string, return a matching result of the iterative type, each element is a match object iteration
regex.sub() String replaces all substring match the regular expression in a string, returns after replacement

When the control flag flags 2.4 using regular expressions

Modifiers description
re.I Of matches are not case sensitive, ignoring the regular expression case, [A-Z] can be match lowercase
re.L Do identify the localization (locale-aware) Match
re.M Each row in the regular expression ^ operator can be given as a character string matching the start
re.S Regular expression. Operator able to match all characters, default matches all characters except newline
re.U According to parse character Unicode character set. This flag affect \ w, \ W, \ b, \ B.
re.X This flag by giving you more flexibility in format so that you will write regular expressions easier to understand.

Three, Re Match object library

Match object is the result of a match, including matching a lot of information

3.1 Match object attributes

Attributes Explanation
.string Text to be matched
.re Match the patter object used when (regular expressions)
.pos Regular expression search text starting position
.endpos Regular expression search text end position

Methods 3.2 Match object

Attributes Explanation
.group(0) Matching string is obtained
.start() Matching string starting position of the original string
.end() Matching string end position of the original string
.span() Return (.start (), .end ())

Fourth, greedy matching Re library and minimum match

Re library defaults greedy match , i.e., an output matching long substrings

4.1 Minimum Match

Operators Explanation
*? Previous character 0 times or an unlimited number of extensions, little match
+? 1 previous character or unlimited expansion, small match
?? Previous character 0 or 1 expansion, a small match
{m,n}? A front extension character m to n times (including n), small matching

As long as the length of the output may be different, may be increased by the operator? Becomes small matching

* Greedy

.*: 贪婪模式(最大化),找到继续找,让结果最大化

import re
s = 'abbbcabc'
# .*: 贪婪模式(最大化),找到继续找,让结果最大化
print(re.findall('a.*c', s))
print(re.findall('a.+c', s))

['abbbcabc']
['abbbcabc']

.*? 非贪婪模式(最小匹配)

.*?: 非贪婪模式(最小化),找到就马上停止

import re
s = 'abbbcabc'
# .*?: 非贪婪模式(最小化),找到就马上停止
print(re.findall('a.*?c', s))
print(re.findall('a.+?c', s))

['abbbc', 'abc']
['abbbc', 'abc']

Guess you like

Origin www.cnblogs.com/XuChengNotes/p/11404529.html