basis
\ d matches a digital
\ w match a number or letter, or _
\ S matches a space (including Tab etc. whitespace)
. matches any character (number 1)
* denotes any number of characters (including 0)
+ represents at least one character
? 0 represents a character or
{n} represents n characters
{n, m} denotes nm character
represents the beginning of the line, \ d represents must begin with a number.
He expressed the need to end with a number. (Note that \ W and \ S corresponding \ w and \ s negated)
A look complex example: \ d {3} \ s + \ d {3,8} interpret what we left to right:
1. \ d {3} represents a match three numbers, such as '010';
2. \ s matches a space (including whitespace Tab, etc.), so \ s + represents at least one space, for example, match '', '' and the like;
3. \ d {3,8} denotes digits 3-8, for example, '1234567'
Taken together, the above regular expression can match any spaces separated by phone number with area code.
If you want to match this number 010-12345 do? Since - is a special character, the regular expression, use "" escape, therefore, the above n is \ d {3} - \ d {3,8}
Advanced
More precisely do match, with [] represents the range of, for example:
[0-9a-zA-Z_] matches a number, a letter or underscore;
[0-9a-zA-Z _] + at least a matching string may be a number, letter or underscore, such a100,0_Z, Py3000 like;
[A-zA-Z _] [0-9a-zA-Z _] * match with the letter or underscore, followed by any one of a number, letter or underscore character string, i.e. variable valid Python;
[A-zA-Z _] [0-9a-zA-Z _] {0, 19} more precisely limits the length of the variable is 1-20 characters (1 character in front of the back up to 19 characters +)
A | B matches A or B, and therefore (P | p) ython match Python or python
py can also match python, but instead becomes a ^ py $ entire line matching, can only match the py
re module
Since Python strings themselves with \ escape, so pay special attention:
s = 'ABC \ -001' # # string corresponding Python regular expression string becomes: # 'ABC-001'
so we strongly recommend using Python r prefix, they do not consider the problem of the escape:
s = r'ABC-001 '# Python # string corresponding to the same regular expression string: #' ABC-001 '
, if the match is successful, returns a Match object, otherwise None. The common method is to determine:
test = 'user-entered string' if re.match (r 'regular expressions', Test):
Print ( 'OK') the else:
Print ( 'failed')
Slicing string
Splitting a regular expression string more flexible than fixed characters look normal segmentation codes:
'ab c'.split (' ') [ ' a ',' b ', ",",' c ']
ah, unrecognized consecutive spaces, the expression with n try:re.split (R & lt '\ S +', 'ab & C')
[ 'A', 'B', 'C']
can be divided properly. Join, try:re.split (R & lt '[\ S,] +', 'A, B, C D')
[ 'A', 'B', 'C', 'D']
was added; try:re.split(r’[\s,\;]+’, ‘a,b;; c d’)
[‘a’, ‘b’, ‘c’, ‘d’]
Packet
In addition to simply determining matches outside, then the regular expression substring extraction power. Packet is to extract (Group) by () representation. For example: ^ (\ d {3}) - (\ d {3,8}) $ define the two groups can be extracted directly from the string matching the area code and the local number:
m = re.match(r’^(\d{3})-(\d{3,8})$’, ‘010-12345’)>>> m
<_sre.SRE_Match object; span=(0, 9), match=’010-12345’>>>> m.group(0)’010-12345’>>> m.group(1)’010’>>> m.group(2)’12345’
Greed match
Regular match default is greedy matching, that is, matching as many characters. By way of example as follows, the back of the digital matched 0:
re.match (R & lt '^ (\ d +) (0 *) $', '102300'). Groups ()
( '102300', ")
since \ d + greedy match, directly behind the 0 all matched, the result 0 * only matches the empty string.
Must let \ d + non-greedy match (that is, less match as possible), in order to back out of the match 0, so that you can add a \ d + non-greedy match?:
re.match(r’^(\d+?)(0*)$’, ‘102300’).groups()
(‘1023’, ‘00’)
Compile
When we use regular expressions in Python, re internal module would do two things:
Compiling a regular expression, if the string regular expression itself is not illegal, incorrect report
With being compiled expression to match string.
If a regular expression to be reused thousands of times, for efficiency reasons, we can pre-compile the regular expression, you do not need to compile this step when the next re-use, direct matching:
import re# 编译:>>> re_telephone = re.compile(r’^(\d{3})-(\d{3,8})$’)# 使用:>>> re_telephone.match(‘010-12345’).groups()
(‘010’, ‘12345’)>>> re_telephone.match(‘010-8086’).groups()
(‘010’, ‘8086’)