A regular expression is to express the string inherent laws between, similar to the function expression in mathematics .
python package used for the re-party packages need to be installed in advance.
The purpose is to achieve a normal function can not be implemented lookup (findall), replace (replace) and division (split).
Representation common function is:
A. Find findall, can not be applied substring string returned by slice.
findall(pattern,string,flags=0)
Wherein the pattern is an expression, string is the string operations, flags specify pattern matching, re.I making case-insensitive mode represents .
B. Alternatively replace, by means of the method could not complete or replace replace fixed value or a fixed value of the position
sub(pattern,repl,string,count=0,flags=0)
Which, like other parameters, COUNT is the number of replacement.
C. split split, by means of segmentation methods can not achieve spilt string value in accordance with a variety of
split(pattern,string,maxsplit=0,flags=0)
maxsplit means cutting times
Universal matching symbols . *?
The representative of any symbol, * on behalf of a former character matches zero or more times, the last question mark? It is to prevent the greedy match until later in quotation marks appears.
. *? I can solve most problems.
###################################### The following is the important points ####### ##########################################
Regular expressions function in the pattern element representing 9 species, divided according to the number of matches a match, unlimited match and a limited number of matches .
- A match: English characters point of the original period state . Escapes \ English state of [] in English state ()
- Unlimited match: the English state ? Plus + multiplication sign *
- A limited number of matches: English state {}
Matches one
a. the original character string matching inside the original substrings
. B English dot state point: refers to any characters (numbers, letters, punctuation, characters, etc.), but in addition newline \ n
Before adding any character \ escapes, on behalf of its original meaning
. C backslash: escape character, symbol conversion for meaning.
\ N: refers to the substitution line, enter
\ T: refer to the tab-delimited, a reduced grid
\ D: refer to any number of 0 to 9
\ S: refers to any of a blank (e.g., blank space, tab, line breaks, etc.)
\ W: refers to any generation of the letters, numbers and underscores a total of 63 species (52 letters, 10 digits, underscore)
\ .: itself refers to a full stop point
\\: only that slashes itself
d. English state of the brackets []
Refers to the character set and, when needed to match a specific character, the brackets can be selected. As this character can only be composed of 5678, the use of [5678]
[] Between the internal element must not be separated by commas, such as [a-zA-Z0-9], can not be used between commas
e. the English state parentheses ()
Taken to refer to particular content, such as taking age ( '\ d')
Unlimited number of matches - only match one character before
f. the question mark in English state?
Before a character means to match 0 or 1, only two opportunities.
g. + Plus
Represents a character more than once before the match and
h. asterisk *
It represents a character 0 or more times before the match
The number of matches for the limited number of times
Braces i.} {English state, fixing the number of matches
Before a character which matches a specific number or range.
Before a matching character {m} m times
{M,} character before a match at least m times
{M, n} matches a character before the m ~ n times
{, N} as much as the former matches a character n times
Universal matching symbols. *?
The representative of any symbol * represents 0 or more matches, the last question mark? It is to prevent the greedy match until later in quotation marks appears.
. *? I can solve most problems.