Read a regular expression

Copyright: Merry due share and eternity, knowledge sharing and immortal, I share my happiness ... https://blog.csdn.net/walter247443819/article/details/79820705

basis

\ d matches a digital
\ w match a number or letter, or _
\ S matches a space (including Tab etc. whitespace)
. matches any character (number 1)
* denotes any number of characters (including 0)
+ represents at least one character
? 0 represents a character or
{n} represents n characters
{n, m} denotes nm character
represents the beginning of the line, \ d represents must begin with a number.
table Show Row of Knot bundle \d He expressed the need to end with a number. (Note that \ W and \ S corresponding \ w and \ s negated)

A look complex example: \ d {3} \ s + \ d {3,8} interpret what we left to right:

1. \ d {3} represents a match three numbers, such as '010';

2. \ s matches a space (including whitespace Tab, etc.), so \ s + represents at least one space, for example, match '', '' and the like;

3. \ d {3,8} denotes digits 3-8, for example, '1234567'

Taken together, the above regular expression can match any spaces separated by phone number with area code.

If you want to match this number 010-12345 do? Since - is a special character, the regular expression, use "" escape, therefore, the above n is \ d {3} - \ d {3,8}

Advanced

More precisely do match, with [] represents the range of, for example:

[0-9a-zA-Z_] matches a number, a letter or underscore;

[0-9a-zA-Z _] + at least a matching string may be a number, letter or underscore, such a100,0_Z, Py3000 like;

[A-zA-Z _] [0-9a-zA-Z _] * match with the letter or underscore, followed by any one of a number, letter or underscore character string, i.e. variable valid Python;

[A-zA-Z _] [0-9a-zA-Z _] {0, 19} more precisely limits the length of the variable is 1-20 characters (1 character in front of the back up to 19 characters +)

A | B matches A or B, and therefore (P | p) ython match Python or python

py can also match python, but instead becomes a ^ py $ entire line matching, can only match the py

re module

Since Python strings themselves with \ escape, so pay special attention:

s = 'ABC \ -001' # # string corresponding Python regular expression string becomes: # 'ABC-001'
so we strongly recommend using Python r prefix, they do not consider the problem of the escape:

s = r'ABC-001 '# Python # string corresponding to the same regular expression string: #' ABC-001 '
, if the match is successful, returns a Match object, otherwise None. The common method is to determine:

test = 'user-entered string' if re.match (r 'regular expressions', Test):
Print ( 'OK') the else:
Print ( 'failed')

Slicing string

Splitting a regular expression string more flexible than fixed characters look normal segmentation codes:

'ab c'.split (' ') [ ' a ',' b ', ",",' c ']
ah, unrecognized consecutive spaces, the expression with n try:

re.split (R & lt '\ S +', 'ab & C')
[ 'A', 'B', 'C']
can be divided properly. Join, try:

re.split (R & lt '[\ S,] +', 'A, B, C D')
[ 'A', 'B', 'C', 'D']
was added; try:

re.split(r’[\s,\;]+’, ‘a,b;; c d’)
[‘a’, ‘b’, ‘c’, ‘d’]

Packet

In addition to simply determining matches outside, then the regular expression substring extraction power. Packet is to extract (Group) by () representation. For example: ^ (\ d {3}) - (\ d {3,8}) $ define the two groups can be extracted directly from the string matching the area code and the local number:

m = re.match(r’^(\d{3})-(\d{3,8})$’, ‘010-12345’)>>> m
<_sre.SRE_Match object; span=(0, 9), match=’010-12345’>>>> m.group(0)’010-12345’>>> m.group(1)’010’>>> m.group(2)’12345’

Greed match

Regular match default is greedy matching, that is, matching as many characters. By way of example as follows, the back of the digital matched 0:

re.match (R & lt '^ (\ d +) (0 *) $', '102300'). Groups ()
( '102300', ")
since \ d + greedy match, directly behind the 0 all matched, the result 0 * only matches the empty string.

Must let \ d + non-greedy match (that is, less match as possible), in order to back out of the match 0, so that you can add a \ d + non-greedy match?:

re.match(r’^(\d+?)(0*)$’, ‘102300’).groups()
(‘1023’, ‘00’)

Compile

When we use regular expressions in Python, re internal module would do two things:

Compiling a regular expression, if the string regular expression itself is not illegal, incorrect report

With being compiled expression to match string.

If a regular expression to be reused thousands of times, for efficiency reasons, we can pre-compile the regular expression, you do not need to compile this step when the next re-use, direct matching:

import re# 编译:>>> re_telephone = re.compile(r’^(\d{3})-(\d{3,8})$’)# 使用:>>> re_telephone.match(‘010-12345’).groups()
(‘010’, ‘12345’)>>> re_telephone.match(‘010-8086’).groups()
(‘010’, ‘8086’)

Guess you like

Origin blog.csdn.net/walter247443819/article/details/79820705