I. Introduction
Getting started with this article we use regular expressions, regular expressions rules apply not only python language, basic programming languages are most suitable, extremely widespread in everyday use, readers need to learn regular expressions. After reading this article, readers should understand what a regular expression, the rules of regular expressions, regular expressions common example of how to use regular expressions operating functions in python;
The concept of two regular expressions
Regular expressions refers to the use of a special pattern matching string in a string of substrings obtained, we can obtain the substring extraction, replacement and other operations ;
For example zszxz666
the string, and now the knowledge seekers want to get a substring zszxz
, you need to go through to get a pattern matching substring, regular expression in this mode can be a variety of formats, knowledge seekers here in the simplest positive mode [a-z]*
, and then through the python regular expression matching substring function can be obtained zszxz
; string functions operating in this manner easier than usual for a wide range;
Three commonly used regular pattern matching
Common regular expression pattern is as follows, if there is doubt on these patterns can refer to the regular expression manual ; canonical example manual also for everyday use, such as user name, password, email, URL pattern matching and so on;
mode | meaning |
---|---|
^ | Matches the beginning of the string |
$ | Matches the end of the string |
. | Match any character except newline |
+ | Matches the preceding subexpression one or more times |
? | Matches the preceding subexpression zero or one, or represent a non-greedy qualifiers |
* | Matches the preceding subexpression zero or more times |
\ | Escape special characters |
\d | Match any number, is equivalent to [0-9]. |
\D | Matches any non-digit |
\s | Matches any whitespace (tab, line feed, carriage return, feed, vertical tab,), equivalent to[^\f\n\r\t\v] |
\S | Matches any non-whitespace characters. Equivalence[^\f\n\r\t\v] |
\w | Match alphanumeric underscore |
\W | Matching non-alphanumeric underscore |
[…] | It used to represent a set of characters; [AMK] matches 'a', 'm' or 'k' |
[^…] | Does not match the character [] is; [^amk] mismatch 'a', 'm' or 'k' |
{n} | Match the preceding subexpression n times |
{n,} | Preceding subexpression matches at least n times |
{n,m} | Match the preceding subexpression matches least n times m times and match up |
| | Represents or; a | b, which matches a or b |
\b | Matches a word boundary, that is, the location and the space between the word |
\B | Non-word boundary matches |
Python regular four symbols commonly Matthews
re.I | The match is not case sensitive |
---|---|
re.L | Do identify the localization (locale-aware) Match |
re.M | Multi-line matching, affecting ^ and $ |
re.S | Make. All matches including newline characters, including |
re.U | According to parse character Unicode character set. This flag affect \ w, \ W, \ b, \ B. |
re.X | This flag by giving you more flexibility in format so that you will write regular expressions easier to understand |
Five common python regular rows Description
- pattern represents a regular expression pattern
- string represents a string to be matched passed
- flags flag can be defined with the modifier IV
- count indicates the number of defined match
- repl string representing alternative, also be a function of
- starting position pos
- endpos end position
- maxsplit maximum number of divisions
Function name | Function Meaning |
---|---|
re.findall(string, pos, endpos) | Match all substrings, and returns a list of unmatched, empty list is returned |
re.match(pattern, string, flags=0) | Matching a pattern from a starting position of the string, if the match fails returns None |
re.search(pattern, string, flags=0) | Scanning the entire first string and returns a successful match; failed return matching None |
re.compile(pattern, flags=0) | Compiling a regular expression to generate a regular expression (Pattern) objects |
re.sub(pattern, repl, string, count=0, flags=0) | Find and Replace |
re.finditer(pattern, string, flags=0) | Findall with similar returns iterators |
re.split(pattern, string, maxsplit=0, flags=0]) | After the matched substring is divided returns a list of |
Six examples of commonly used functions
6.1 match function
group(num=0) 函数表示提取匹配的表达式,可以使用组号提取对应的匹配结果;知识追寻者想要获得字符串中第一个出现的数字串;
import re
# 指定模式 至少匹配一个数字
pattern = re.compile(r'\d+')
# 输入的字符串
mat = pattern.match("451zszxz666")
# 获得第一个匹配到的值
g = mat.group();
# 451
print(g)
6.2search函数
知识追寻者想要获得指定的字符串,第一个匹配的就好;
import re
# 想匹配nhzszxz 或者 nh666 或者 nhnh
pattern = re.compile(r'nh(zszxz|666|nh)')
ser = pattern.search('nhzszxzkkk nh666 llll nhnh')
g_0 = ser.group()
# zszxz
print(g_0)
g_1 = ser.group(1)
# nhzszxz
print(g_1)
6.3 findall函数
知识追寻者想要在字符串中获得所有的数字;
import re
pattern = re.compile(r'\d+')
# 输入的字符串
mat = pattern.fidall("451zszxz666")
# ['451', '666']
print(mat)
# 666
print(mat[1])
6.6 sub函数
知识追寻者想要获得所有非数字的子串;
import re
str = '8556gfggs5555dfg'
# 替换所有数字
result = re.sub(r'\d', '', str)
# gfggsdfg
print(result)
6.7 split函数
知识追寻者想要获得以,
分割的字符串;
import re
str = '123,456,zszxz,666'
result = re.split(',',str)
# ['123', '456', 'zszxz', '666']
print(result)
6.8 finditer 函数
知识追寻者想要获得数字451,和666;
import re
pattern = re.compile(r'\d+')
# 输入的字符串
mat = pattern.finditer("451zszxz666")
for it in mat:
print(it.group())
七 初学者使用正则表达式正确的姿势
初学者在使用正则表达式的时候难免会得到的匹配的结果与自己预期的不符合,可以借助一些在线工具匹配完成后再进行代码编写,常用的在线正则匹配测试如下;