Getting Started with Python's built-in module - re module

Getting Started with Python's built-in module - re module

1, re module

(1) What are regular?

 Regular use of some symbol combinations that have special meaning or the method described character string together (referred to as regular expressions). Or: Regular rule is used to describe a class of things. (In Python) it is embedded in Python, and is achieved by re module. Regular expression pattern is compiled into a series of byte code, written in C and then executed by the matching engine.

Metacharacters Matched content
\w Match letter (containing Chinese), or numbers, or underscores
\W Matching non-alphabetic (contains Chinese), or numbers, or underscores
\s Matches any whitespace
\S Matches any non-whitespace
\d Matching numbers
\D Matching non-numeric
\A From the beginning of the string match
\with Matches the end of the string, if it is for the line, only the matching results of the previous wrap
\n Matches a newline
\t A matching tab
^ Matches the beginning of the string
$ End of the string
. Matches any character except newline, when re.DOTALL flag is specified, it will match any character comprises a newline.
[...] Matches the character set of characters
... Matches all characters except the characters in the character set
* Match zero or more characters to the left.
+ Match one or more characters left.
Matches zero or one character to the left, non-greedy way.
{n} Precisely matches the n preceding expression.
{n,m} N m times to match the regular expression by the preceding definition segment, greedy manner
from Matches a or b
() Matching expression in parentheses, also represents a group

-------------------------------------------------- ----------------------------- ------------------- match mode --------------------------------------------

import  re 

<1> \ w letters, numbers, underscores, Chinese

print(re.findall("\w","小明-Marry_dsb123啸天吃D早餐"))   # \w 字母.数字.下划线.中文 

<2> \ W instead of letters, numbers, underscores, Chinese

print(re.findall("\W","小明-Marry_dsb123啸天吃D早餐"))   # \w 不是字母.数字.下划线.中文 

<3> \ d match numbers

print(re.findall("\d","十10⑩"))                       #  \d 匹配数字   

<4> \ D matches non-digital

print(re.findall("\D","十10⑩"))                       # \D 匹配非数字

<5> \ A match from the beginning of the string begins with any conventional ^ a

print(re.findall("\Aa","asfdasdfasdfalex"))
print(re.findall("^a","alex"))                        # 以什么开头  

<6> \ Z matches what the end of the string ends commonly $

print(re.findall("d\Z","asfdasdfasdfalex"))
print(re.findall("x$","alex"))                        # 匹配什么结尾  

<7> \ n newline

print(re.findall("\n","alex\nwusir"))

<8> \ t matching tab

print(re.findall("\t","alex\twusir"))

<9> string matching the corresponding string

print(re.findall("alex","alex\twusiralex"))

<10> [...] matches the character set of character

print(re.findall('[0-9]',"小明-Marry_dsb123啸天吃D早餐"))                 
print(re.findall('[a-z]',"小明-Marry_dsb123啸天吃D早餐"))
print(re.findall('[A-Z]',"小明-Marry_dsb123啸天吃D早餐"))

<11> ^ [] matches the character set of the non-character

print(re.findall("[^0-9a-z]","123alex456"))  

<12> * matches zero or more characters that match the left greedy

print(re.findall("a*","marry,aa,aaaa,bbbbaaa,aaabbbaaa"))    # 匹配*左侧字符串0次或多次  贪婪匹配

<13> + 1 or more matching leftmost character matching greedy

print(re.findall("a+","alex,aa,aaaa,bbbbaaa,aaabbbaaa"))  匹配左侧字符串一次或多次  贪婪匹配 

<14>? Matches zero or one character to the left non-greedy match

print(re.findall("a?","alex,aa,aaaa,bbbbaaa,aaabbbaaa"))  # 匹配?号左侧0个或1个 非贪婪匹配

<15> {n} of n match precisely the left front expression

print(re.findall("[0-9]{11}","18612239999,18612239998,136133333323")) # 指定查找的元素个数 

<16> {n, m} matches n times m to the front of the regular expression defined fragment

print(re.findall("a{3,8}","alex,aaaabbbaaaaabbbbbbaaa,aaaaaaaaabb,ccccddddaaaaaaaa")) 

<17> a | b matches a or b

print(re.findall("a|b","alexdsb"))

<18> () matches expressions within the brackets, but also represents a group

print(re.findall("<a>(.+)</a>","<a>alex</a> <a>wusir</a>"))     #分组  
print(re.findall("<a>(.+?)</a>","<a>alex</a> <a>wusir</a>"))   #控制贪婪匹配 

<19>. Matches any character except newline as defined re.DOTALL, it is possible to match newline

print(re.findall("a.c","abc,aec,a\nc,a,c"))           # 匹配任意一个字符串(\n除外) 
print(re.findall("a.c","abc,aec,a\nc,a,c",re.DOTALL))

<20>. There is no any function of

print(re.findall("-\d+\.\d+|-[0-9]|\d+",s))

<21> \ s matches a space

print(re.findall("\s","alex\tdsbrimocjb"))            # \s 匹配空格

<22> \ S matches non-whitespace

print(re.findall("\S","alex\tdsbrimocjb"))            # \s 匹配非空格

Questions:

有如下字符串:'alex_sb ale123_sb wu12sir_sb wusir_sb ritian_sb' 的 alex wusir '找到所有带_sb的内容

answer:

s = 'alex_sb ale123_sb wu12sir_sb wusir_sb ritian_sb'print(re.findall("(.+?)_sb",s))

-------------------------------------------------- ----------------- ------------------------------- common method ---------------------------------

(1) findall all find returns a list

print(re.findall("alex","alexdsb,alex_sb,alexnb,al_ex"))

(2) Search for the matching string found anywhere in a stopped, it returns an object. For a matching content must .group () for obtaining

print(re.search("a.+","lexaaaa,bssssaaaasa,saaasaasa").group())

(3) match to match from the beginning of the string

print(re.match("a.+","alexalexaaa,bssssaaaasa,saaasaasa").group())

Questions

The difference between search and match

search from anywhere Find

Find a match from the beginning, if it does not continue not to find the

Both with group () to view it

(4) split - split must have a []

print(re.split("[:;,.!#]","alex:dsb#wusir.djb"))

(5) sub - Replace

s = "alex:dsb#wusir.djb"
print(re.sub("d","e",s,count=1))

(6) complie - definition of matching rule

s = re.compile("\w")
print(s.findall("alex:dsb#wusir.djb"))

(7) finditer - returns an iterator

s = re.finditer("\w","alex:dsb#wusir.djb")   # 返回的就是一个迭代器
print(next(s).group())
print(next(s).group())
for i in s:
    print(i.group())

(8) search - named to the packet P?

ret = re.search("<(?P<tag_name>\w+)>\w+</\w+>","<h1>hello</h1>")
ret = re.search("<(?P<tag_name>\w+)>(?P<content>\w+)</\w+>","<h1>hello</h1>")
print(ret.group("tag_name"))
print(ret.group("content"))

Guess you like

Origin www.cnblogs.com/caiyongliang/p/11539822.html