Start from getting started, learn python (7)
A regular expression
1.1 Introduction
Regular expression is a special sequence of characters, which can easily check whether a string matches a certain pattern. The
re module makes the Python language have all the regular expression functions.
1.2 re.match function
re.match tries to match a pattern from the beginning of the string. If the match is not successful at the beginning, match() returns none.
函数语法:
re.match(pattern, string, flags=0)
函数参数说明
pattern 匹配的正则表达式
string 要匹配的字符串。
flags 标志位,用于控制正则表达式的匹配方式
匹配成功re.match方法返回一个匹配的对象,否则返回None
Instance
#!/usr/bin/python
#-*- coding: UTF-8 -*-
import re
print(re.match('www', 'www.runoob.com').span())#在起始位置匹配
print(re.match('com', 'www.runoob.com'))#不在起始位置匹配
以上实例运行输出结果为:
(0, 3)
None
1.3 re.search method
re.search scans the entire string and returns the first successful match.
函数语法:
re.search(pattern, string, flags=0)
函数参数说明:
pattern 匹配的正则表达式
string 要匹配的字符串。
flags 标志位,用于控制正则表达式的匹配方式
匹配成功re.search方法返回一个匹配的对象,否则返回None
实例
#!/usr/bin/python
#-*- coding: UTF-8 -*-
import re
print(re.search('www', 'www.runoob.com').span())#在起始位置匹配
print(re.search('com', 'www.runoob.com').span())#不在起始位置匹配
以上实例运行输出结果为:
(0, 3)
(11, 14)
1.4 Examples of regular expressions
1.4.1 Character Class
- [0-9] matches any number. Similar to [0123456789]
- [az] matches any lowercase letter
- [AZ] matches any uppercase letter
- [a-zA-Z0-9] matches any letter and number
- [^aeiou] All characters except aeiou letters
- [^0-9] matches characters except numbers
1.4.2 Special Character Class
- Matches any single character except "\n". To match any character including'\n', please use a pattern like'[.\n]'
- \d matches a digit character. Equivalent to [0-9]
- \D matches a non-digit character. Equivalent to [^0-9]
- \s matches any blank character, including spaces, tabs, form feeds, etc. Equivalent to [\f\n\r\t\v]
- \S matches any non-whitespace character. Equivalent to [^ \f\n\r\t\v]
- \w matches any word character including underscore. Equivalent to'[A-Za-z0-9_]'
- \W matches any non-word character. Equivalent to'[^A-Za-z0-9_]'