(python) regular re module

    A regular expression is a special sequence of characters that helps us determine whether a string matches a certain pattern. Python has added the re module since version 1.5, which provides Perl (a high-level programming language) style regular expression patterns. The re module brings full regular expression capabilities to the Python language.

1. findall (most commonly used)

    Finds all substrings in a string matched by a regular expression and returns a list, or an empty list if no matches.

The syntax format is:

 

        pattern: the regular expression to match

        string: the string to match

        flags: Flag bits used to control how regular expressions are matched.

2、serach

    In the string, only the first string that matches the regular expression expression is matched. If it matches, a _sre.SRE_Mactch object is returned, which can be obtained by group(). Returns None if there is no match.



3、match

    Match the string that matches the regular expression from the first position on the left in the string. If it matches, return a _sre.SRE_Mactch object, which can be obtained by group(). Returns None if there is no match.



It is equivalent to:

 

If there is no match from the first left, return None


4, sub (less used)

Replace the string in the string that matches the regex.


repl: is the string to be replaced with.

count: Indicates the number of times to be replaced.


5、split

It is to split the string with the characters that match the regular expression expression.


6、compile

compile allows a regular expression to be used repeatedly.



Here are some patterns of regular expressions:

model
describe
^
Matches the beginning of the string.
$
Matches the end of the string.
.
Matches any character except newline, when the re.DOTALL flag is specified, can match any character including newline.
[...]
Used to represent a set of characters, listed individually: [amk] matches ' a ' , ' m ' or ' k '.
[^...]
Characters not in []: [^ abc] matches characters other than a,b,c.
* 
Match zero or more expressions.
+  
Match 1 or more expressions.
?
Match 0 or 1 fragment defined by the preceding regular expression, non-greedy way
re{ n}
Matches n preceding expressions. For example, " o{2} " does not match the " o " in " Bob " , but does match the two o's in " food " .
re{ n,}
精确匹配n个前面表达式。例如,"o{2,}"不能匹配"Bob"中的"o",但能匹配"foooood"中的
所有o。"o{1,}"等价于"o+""o{0,}"则等价于"o*"
re{ n, m} 
匹配 n 到 m 次由前面的正则表达式定义的片段,贪婪方式。
a| b 
匹配a或b。
(re)
匹配括号内的表达式,也表示一个组。
(?imx)
正则表达式包含三种可选标志:i, m, 或 x 。只影响括号中的区域。
(?-imx)
正则表达式关闭 i, m, 或 x 可选标志。只影响括号中的区域。
(?: re)
类似 (...), 但是不表示一个组。
(?imx: re)
在括号中使用i, m, 或 x 可选标志。
(?-imx: re)
在括号中不使用i, m, 或 x 可选标志
(?#...)
注释
(?= re)
前向肯定界定符。如果所含正则表达式,以 ... 表示,在当前位置成功匹配时成功,否则失败。
但一旦所含表达式已经尝试,匹配引擎根本没有提高;模式的剩余部分还要尝试界定符的右边。
(?! re)
前向否定界定符。与肯定界定符相反;当所含表达式不能在字符串当前位置匹配时成功。
(?> re)
匹配的独立模式,省去回溯。
\w 
匹配数字字母下划线。
\W
匹配非数字字母下划线。
\s
匹配任意空白字符,等价于 [\t\n\r\f]。
\S 
匹配任意非空字符。
\d
匹配任意数字,等价于 [0-9]。
\D 
匹配任意非数字。
\A(^) 
匹配字符串开始
\Z($)
匹配字符串结束,如果是存在换行,只匹配到换行前的结束字符串。
\z 
匹配字符串结束。
\G 
匹配最后匹配完成的位置。
\b 
匹配非单词边界。'er\B' 能匹配 "verb" 中的 'er',但不能匹配 "never" 中的 'er'。
\n
匹配一个换行符。
\t
匹配一个制表符
\1...\9 
匹配第n个分组的内容。
\10 
匹配第n个分组的内容,如果它经匹配。否则指的是八进制字符码的表达式。
 
 
# 重复匹配:
# .   ?   *   +  {m,n}  .*  .*?
# 1、.:代表除了换行符外的任意一个字符
print(re.findall('a.c','abc a1c aAc aaaaaca\nc'))
print(re.findall('a.c','abc a1c aAc aaaaaca\nc',re.DOTALL))

# 2、?:代表左边那一个字符重复0次或1次
print(re.findall('ab?','a ab abb abbb abbbb abbbb'))

# 3、*:代表左边那一个字符出现0次或无穷次
print(re.findall('ab*','a ab abb abbb abbbb abbbb a1bbbbbbb'))

# 4、+ :代表左边那一个字符出现1次或无穷次
print(re.findall('ab+','a ab abb abbb abbbb abbbb a1bbbbbbb'))

# 5、{m,n}:代表左边那一个字符出现m次到n次
print(re.findall('ab?','a ab abb abbb abbbb abbbb'))
print(re.findall('ab{0,1}','a ab abb abbb abbbb abbbb'))

print(re.findall('ab*','a ab abb abbb abbbb abbbb a1bbbbbbb'))
print(re.findall('ab{0,}','a ab abb abbb abbbb abbbb a1bbbbbbb'))

print(re.findall('ab+','a ab abb abbb abbbb abbbb a1bbbbbbb'))
print(re.findall('ab{1,}','a ab abb abbb abbbb abbbb a1bbbbbbb'))

print(re.findall('ab{1,3}','a ab abb abbb abbbb abbbb a1bbbbbbb'))

# 6、.*:匹配任意长度,任意的字符=====》贪婪匹配
print(re.findall('a.*c','ac a123c aaaac a *123)()c asdfasfdsadf'))
#['ac a123c aaaac a *123)()c']

# 7、.*?:非贪婪匹配
print(re.findall('a.*?c','a123c456c'))
#['a123c']

# ():分组
print(re.findall('(alex)_sb','alex_sb asdfsafdafdaalex_sb'))

print(re.findall(
    'href="(.*?)"',
    '<li><a id="blog_nav_sitehome" class="menu" href="http://www.cnblogs.com/">博客园</a></li>')
)
# 结果为:['http://www.cnblogs.com/']

# []:匹配一个指定范围内的字符(这一个字符来自于括号内定义的)
print(re.findall('a[0-9][0-9]c','a1c a+c a2c a9c a11c a-c acc aAc'))

# 当-需要被当中普通符号匹配时,只能放到[]的最左边或最 右边
print(re.findall('a[-+*]c','a1c a+c a2c a9c a*c a11c a-c acc aAc'))
print(re.findall('a[a-zA-Z]c','a1c a+c a2c a9c a*c a11c a-c acc aAc'))

# []内的^代表取反的意思
print(re.findall('a[^a-zA-Z]c','a c a1c a+c a2c a9c a*c a11c a-c acc aAc'))
print(re.findall('a[^0-9]c','a c a1c a+c a2c a9c a*c a11c a-c acc aAc'))

print(re.findall('([a-z]+)_sb','egon alex_sb123123wxxxxxxxxxxxxx_sb,lxx_sb'))


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325749427&siteId=291194637