Strings are related to programming a data structure most. We need to manipulate strings almost everywhere. For example infer whether a string is a valid Email address. Although it is possible to extract a substring @ before and after the program, and then were to infer whether the words and the domain name, but doing so is not only cumbersome and difficult to reuse the code.
Regular expressions are a powerful weapon used to match strings. Its design idea is to describe in a narrative language to define a rule string, all in line with the rules of the string, we feel it juice "match", otherwise, the string is not legitimate.
So we infer whether a string is a legitimate method Email is:
Email 1. Create a matching regular expression;
2. The regular expression to match the user's input to infer legality.
Because the regular expression is represented by the string, so we have to first understand how to use characters to express character.
First, matches a single character
In a regular expression, it is assumed given directly characters , that is an exact match .
\ d can be matched to a digital
\ w able to match a letter or number
. Able to match a random character
As, '00 \ D ' can be matched to ' 007 ' . But can not match '00A' ;
Second, the side length of the match character
Eg, \ d {3} \ s + \ d {3,8}, said three numbers. At least one space, digits 3-8
Third, Advanced
More precisely matched to do, can be used [] indicates a range,
For example:
-
[0-9a-zA-Z\_]
Possible to match a number, letter or underscore. -
[0-9a-zA-Z\_]+
It can be matched by at least one number, letter or underscore character string, for example'a100'
,'0_Z'
,'Py3000'
and the like; -
[a-zA-Z\_][a-zA-Z\_]*
It can be matched with the letter or underscore, followed by a random string of numbers, letters or underscore, i.e. variable valid Python. -
[a-zA-Z\_][a-zA-Z\_]{0, 19}
More precisely limit the variable length of 1-20 characters (1 character + behind the front up to 19 characters).
A|B
It can be A or B. Match It [P|p]ython
is possible to match 'Python'
or 'python'
.
^
It represents the beginning of the line. ^\d
He expressed the need to start with a number.
$
It indicates the end of the line. \d$
He expressed the need to end with a number.
You may have noticed, py
is also able to match 'python'
, but together ^py$
it becomes a whole line matching, it can only match 'py'
up.
Four, re module
Let's look at how to infer whether the regular expression matches:
>>> import re
>>> re.match(r'^\d{3}\-\d{3,8}$', '010-12345')
<_sre.SRE_Match object at 0x1026e18b8>
>>> re.match(r'^\d{3}\-\d{3,8}$', '010 12345')
>>>
match()
Method of inferring match, assuming a successful match. It returns an
Match
object otherwise
None
.The common method is concluded that:
test = '用户输入的字符串'
if re.match(r'正則表達式', test):
print 'ok'
else:
print 'failed'
Substring search
search()
The method includes sub-string is inferred. Suppose available including group () to see the results, it is assumed not include the return None.
>>> m = re.search('[0-9]','abcd3ef')
>>> print m.group(0)
3
>>> m = re.search('[0-9]','abcdef')
>>> m.group()
Alternatively substring
str = re.sub(pattern, replacement, string) # 在string中利用正则变换pattern进行搜索,对于搜索到的字符串。用还有一字符串replacement替换。返回替换后的字符串。
>>> str=re.sub('[0-9]','u','ab2c1def')
>>> str
'abucudef'
Slicing string
>>> 'a b c'.split(' ')
['a', 'b', '', '', 'c']
It was found that does not recognize consecutive spaces, try to use positive expressions:
>>> re.split(r'\s+', 'a b c')
['a', 'b', 'c']
No matter how many spaces are able to properly cut. increase. Try:
>>> re.split(r'[\s\,]+', 'a,b, c d')
['a', 'b', 'c', 'd']
Packet (extracted substring)
()
packet to be extracted is represented by (Group).^(\d{3})-(\d{3,8})$
define the two groups. It can be directly extracted from the matched string code and local number:
>>> m = re.match(r'^(\d{3})-(\d{3,8})$', '010-12345')
>>> m
<_sre.SRE_Match object at 0x1026fb3e8>
>>> m.group(0)
'010-12345'
>>> m.group(1)
'010'
>>> m.group(2)
'12345'
Assuming that the regular expression is defined in the group. It is possible to Match
use the object on group()
the extracted sub-strings method.
Noting group(0)
always original string, group(1)
, group(2)
...... represents 1, 2, ...... substrings.
V. Compile
>>> import re
# 编译:
>>> re_telephone = re.compile(r'^(\d{3})-(\d{3,8})$')
# 使用:
>>> re_telephone.match('010-12345').groups()
('010', '12345')
>>> re_telephone.match('010-8086').groups()
('010', '8086')
Regular Expression compiled after the object because the object contains its own regular expressions. So do not give a positive string when calling the appropriate method.