Regular online test http://tool.chinaz.com/regex
\ Escape character
A character [abc] matching brackets
A character [ac] matches in ac
A character [a-dm-p] or mp of matching ad
Any matching single character except a newline \ n of
\ W matches a word character alphabet (including az AZ), numbers, underscores
\ W matches characters other than \ w of
\ S matches any whitespace (including newline \ n-, carriage return \ R & lt, tab \ T, vertical tab \ V, formfeed \ F)
\ S matches characters other than \ s the
\ D match numbers (0--9)
\ D matches characters other than \ d of
\ n newline \ n
\ T match tab
\ b matching words ending ab \ b with ab for the ending of the word [boundary separator can be a space, comma, -, periods]
\ B matches non-word boundary, such as ab \ B does not end with the word ab
^ Start of the string ^ gh gh string head is
[^ X] x matches any character except non meaning
[^ Abc] matches any character except three letters abc
$ Matching string tail gh $ gh is the end of the string
ae | b matches ae or b [first left and right ab | abc match ab does not match abc]
() Grouping
According to a left parenthesis from left to right ordering of the first set of \ 1 and so represents the same as the first set of content, grouped behind the content with the same \ 1, note that there is an implicit global packet (that is, 0 ), it is the entire regular expression
(? P <name> ......) in addition to the original number, then specify a name, the group behind the content with the same (? P = name)
example
In findall split packets have priority
{N} n times
{N,} n times or more than n times
{N, m} n -m repeated times
{N}? Minimize repeated n times
{N,}? N times or more less try to repeat n times
{N, m}? Nm repeated twice to minimize repeated
* Denotes repeated 0 times or more times [as many matches greedy matches]
*? Repeated zero or more times as little as possible duplicate
+ Repeated once or multiple times [as many matching greedy match]
+? Repeated one or more times as little as possible duplicate
? Repeat 0 or 1 times
Note ruler? It is inert match
. *? x encountered x stop
R behind the characters are ordinary characters
? = Positive pre-investigation [forward conditions match] this position to meet the conditions on the match, does not meet the conditions will not match
Example \ d {3} - \ d {8} (? = Microsoft) Microsoft not displayed
?! Positive negative pre-investigation [non-positive conditions match] this position to meet the conditions will not match, it does not satisfy the conditions to match
Example \ d {3} - \ d {8} (?! Microsoft) Microsoft not displayed
? <= Reverse pre-investigation this position to meet the conditions on the match, does not meet the conditions will not match
Example (<= girl?) \ D {3} - \ d {3} conditions ahead
| Or from left to right match left once the match is not the right match, so the long put on the left
Regular method:
Flag
re.I ignore case
re.L set of special characters \ w, \ W, \ b, \ B, \ s, \ S depends on the current environment
re.M multi-line mode , it will affect ^ $, each row is a new beginning and the end
re.S namely '' and includes any character, including newline ( '.' does not include a line feed)
re.U set of special characters \ w, \ W, \ b, \ B, \ d, \ D, \ s, \ S depends on the Unicode character properties database defaults
re.X To increase readability, ignore comments and whitespace '#' back
compile function
Used to compile regular expressions, generate a regular expression object
Syntax is: P = the re.compile (pattern [, the flags])
Parameter 1 pattern: a string of a regular expression
Parameter 2 flags Alternatively, the flag indicates the matching pattern, such as ignoring the case, multi-line mode
findall
Find all the regular expression matched substring in the string and returns a list, if no match is found, an empty list is returned
A syntax: s = re.findall ( '\ d', 'liming tan23guwu468zhong')
Parameter 1 Regular Expressions
Parameter 2 String
Note: findall a direct function of the expression if (), the default priority match () inside; To ungroup priority within the group added :?
Examples of s = re.findall ( 'www (?: baidu |. Oldboy) .com', 'jhgfwww.oldboy.comhgffd')
Can be named to the packet (? P <name> \ d +)
re.finditer
And findall Similarly, in the string to find all of the regular expression matching sub-strings, and returns them as an iterator
Parameter 1 Regular Expressions
Parameter 2 String
Parameter 3 Optional flag
The return value is an iterator objects, each object needs to group () to get the result
example
s = re.finditer ( '\ d +' adff34lkjhg87nhbvf90 ')
for i in s:
print(i.group())
Two syntax: s = p.findall ( '2345lm0987MING654abc87654', 5,18)
p regular target
Parameter 1 String
Parameter 2 starting position
Parameter 3 end position (not included)
Examples of s = p.findall ( '2345lm0987MING654abc87654', 5,18)
re.search method
Scanning the entire first string and returns a successful match
Function Syntax: re.search (pattern, string, flags = 0)
Parameter 1 Regular Expressions
Parameter 2 to match the string
Parameter 3 optional parameters flag for controlling the regular expression matching method, such as: whether or not case-sensitive, multi-line matching, etc.
The match is successful re.search method returns an object matching, otherwise None
We can use the group () function to obtain the matching object matching the expression , if the return is None , group () will complain
group () with a single parameter (group number) , n is the group number of the packet sequence number expression of a () number is 1
example
s=re.search(r'[A-Z]+','2345lm098KH7654abc87654')
if s !=None:
ss=s.group()
print(ss)
re.match function
re.match try to match a pattern from a starting position of the string, if not the starting position of the match is successful, match () returns none
Successful return match a match of the object , we can use the group () function to obtain the matching object matching expression
Will complain if it was None, group () when
example
s=re.match(r'[A-Z]+','B2345lm098KH7654abc87654')
if s !=None:
ss=s.group()
print(ss)
re.split division
Can be used as delimiter matching, returns a list of the divided
Syntax s = re.split (pattern, string [ , maxsplit = 0, flags = 0])
Parameter 1 Regular Expressions
Parameter 2 String
Parameter 3 optional partition number, maxsplit = 1 once separated, the default is 0, the number is not limited
Parameter 4 optional flags, for controlling the regular expression matching method, such as: whether or not case-sensitive, multi-line matching, etc.
Note: The grouping feature
If (), will be retained () within the group delimiters
example
s=re.split('(\d+)','adjg24kjbvc76lkjh89uytrc94mhb')
Results [ 'adjg', '24' , 'kjbvc', '76', 'lkjh', '89', 'uytrc', '94', 'mhb']
re.sub replacement for string matches
Syntax: S = the re.sub (pattern, the repl, String, COUNT = 0)
Parameter a regular expression matching the regular replacement is
Parameter 2 replace the string, it can also be a function of
Parameter 3 original string
Parameter 4 maximum number of replacements, the default 0 represents replace all matches
Returns string replacement
s = re.subn ( '\ d', '|', 'liming2lixueqian8 Li')
Parameter a regular expression matching the regular replacement is
Parameter 2 replace the string, it can also be a function of
Parameter 3 original string
Parameter 4 maximum number of replacements, the default 0 represents replace all matches
Tuple returns a tuple of the first replacement string is the second term is the number to be replaced
Long emperor arrogance