Getting Started with Python's built-in module - re module
1, re module
(1) What are regular?
Regular use of some symbol combinations that have special meaning or the method described character string together (referred to as regular expressions). Or: Regular rule is used to describe a class of things. (In Python) it is embedded in Python, and is achieved by re module. Regular expression pattern is compiled into a series of byte code, written in C and then executed by the matching engine.
Metacharacters | Matched content |
---|---|
\w | Match letter (containing Chinese), or numbers, or underscores |
\W | Matching non-alphabetic (contains Chinese), or numbers, or underscores |
\s | Matches any whitespace |
\S | Matches any non-whitespace |
\d | Matching numbers |
\D | Matching non-numeric |
\A | From the beginning of the string match |
\with | Matches the end of the string, if it is for the line, only the matching results of the previous wrap |
\n | Matches a newline |
\t | A matching tab |
^ | Matches the beginning of the string |
$ | End of the string |
. | Matches any character except newline, when re.DOTALL flag is specified, it will match any character comprises a newline. |
[...] | Matches the character set of characters |
... | Matches all characters except the characters in the character set |
* | Match zero or more characters to the left. |
+ | Match one or more characters left. |
? | Matches zero or one character to the left, non-greedy way. |
{n} | Precisely matches the n preceding expression. |
{n,m} | N m times to match the regular expression by the preceding definition segment, greedy manner |
from | Matches a or b |
() | Matching expression in parentheses, also represents a group |
-------------------------------------------------- ----------------------------- ------------------- match mode --------------------------------------------
import re
<1> \ w letters, numbers, underscores, Chinese
print(re.findall("\w","小明-Marry_dsb123啸天吃D早餐")) # \w 字母.数字.下划线.中文
<2> \ W instead of letters, numbers, underscores, Chinese
print(re.findall("\W","小明-Marry_dsb123啸天吃D早餐")) # \w 不是字母.数字.下划线.中文
<3> \ d match numbers
print(re.findall("\d","十10⑩")) # \d 匹配数字
<4> \ D matches non-digital
print(re.findall("\D","十10⑩")) # \D 匹配非数字
<5> \ A match from the beginning of the string begins with any conventional ^ a
print(re.findall("\Aa","asfdasdfasdfalex"))
print(re.findall("^a","alex")) # 以什么开头
<6> \ Z matches what the end of the string ends commonly $
print(re.findall("d\Z","asfdasdfasdfalex"))
print(re.findall("x$","alex")) # 匹配什么结尾
<7> \ n newline
print(re.findall("\n","alex\nwusir"))
<8> \ t matching tab
print(re.findall("\t","alex\twusir"))
<9> string matching the corresponding string
print(re.findall("alex","alex\twusiralex"))
<10> [...] matches the character set of character
print(re.findall('[0-9]',"小明-Marry_dsb123啸天吃D早餐"))
print(re.findall('[a-z]',"小明-Marry_dsb123啸天吃D早餐"))
print(re.findall('[A-Z]',"小明-Marry_dsb123啸天吃D早餐"))
<11> ^ [] matches the character set of the non-character
print(re.findall("[^0-9a-z]","123alex456"))
<12> * matches zero or more characters that match the left greedy
print(re.findall("a*","marry,aa,aaaa,bbbbaaa,aaabbbaaa")) # 匹配*左侧字符串0次或多次 贪婪匹配
<13> + 1 or more matching leftmost character matching greedy
print(re.findall("a+","alex,aa,aaaa,bbbbaaa,aaabbbaaa")) 匹配左侧字符串一次或多次 贪婪匹配
<14>? Matches zero or one character to the left non-greedy match
print(re.findall("a?","alex,aa,aaaa,bbbbaaa,aaabbbaaa")) # 匹配?号左侧0个或1个 非贪婪匹配
<15> {n} of n match precisely the left front expression
print(re.findall("[0-9]{11}","18612239999,18612239998,136133333323")) # 指定查找的元素个数
<16> {n, m} matches n times m to the front of the regular expression defined fragment
print(re.findall("a{3,8}","alex,aaaabbbaaaaabbbbbbaaa,aaaaaaaaabb,ccccddddaaaaaaaa"))
<17> a | b matches a or b
print(re.findall("a|b","alexdsb"))
<18> () matches expressions within the brackets, but also represents a group
print(re.findall("<a>(.+)</a>","<a>alex</a> <a>wusir</a>")) #分组
print(re.findall("<a>(.+?)</a>","<a>alex</a> <a>wusir</a>")) #控制贪婪匹配
<19>. Matches any character except newline as defined re.DOTALL, it is possible to match newline
print(re.findall("a.c","abc,aec,a\nc,a,c")) # 匹配任意一个字符串(\n除外)
print(re.findall("a.c","abc,aec,a\nc,a,c",re.DOTALL))
<20>. There is no any function of
print(re.findall("-\d+\.\d+|-[0-9]|\d+",s))
<21> \ s matches a space
print(re.findall("\s","alex\tdsbrimocjb")) # \s 匹配空格
<22> \ S matches non-whitespace
print(re.findall("\S","alex\tdsbrimocjb")) # \s 匹配非空格
Questions:
有如下字符串:'alex_sb ale123_sb wu12sir_sb wusir_sb ritian_sb' 的 alex wusir '找到所有带_sb的内容
answer:
s = 'alex_sb ale123_sb wu12sir_sb wusir_sb ritian_sb'print(re.findall("(.+?)_sb",s))
-------------------------------------------------- ----------------- ------------------------------- common method ---------------------------------
(1) findall all find returns a list
print(re.findall("alex","alexdsb,alex_sb,alexnb,al_ex"))
(2) Search for the matching string found anywhere in a stopped, it returns an object. For a matching content must .group () for obtaining
print(re.search("a.+","lexaaaa,bssssaaaasa,saaasaasa").group())
(3) match to match from the beginning of the string
print(re.match("a.+","alexalexaaa,bssssaaaasa,saaasaasa").group())
Questions
The difference between search and match
search from anywhere Find
Find a match from the beginning, if it does not continue not to find the
Both with group () to view it
(4) split - split must have a []
print(re.split("[:;,.!#]","alex:dsb#wusir.djb"))
(5) sub - Replace
s = "alex:dsb#wusir.djb"
print(re.sub("d","e",s,count=1))
(6) complie - definition of matching rule
s = re.compile("\w")
print(s.findall("alex:dsb#wusir.djb"))
(7) finditer - returns an iterator
s = re.finditer("\w","alex:dsb#wusir.djb") # 返回的就是一个迭代器
print(next(s).group())
print(next(s).group())
for i in s:
print(i.group())
(8) search - named to the packet P?
ret = re.search("<(?P<tag_name>\w+)>\w+</\w+>","<h1>hello</h1>")
ret = re.search("<(?P<tag_name>\w+)>(?P<content>\w+)</\w+>","<h1>hello</h1>")
print(ret.group("tag_name"))
print(ret.group("content"))