- Regular concept
- findall match search method
- The use and role of metacharacters
Regular Expressions concept
A regular expression is a logical operation equation of the string, the string is a kind of filtering
It may determine whether the format match a given
Information may be extracted from a string format specified
re module
findall method
Find all the regular expression matched substring in the string and returns a list, if no match is found, an empty list is returned
match method
From the start position of the string matching
The match is successful, returns a matching object (this object contains information about our match), if not the starting position of the match is successful, match () returns an empty
It can only be matched to a group () to extract the contents of the matched span () to match the extracted character index
search method
Scanning the entire string, the match is successful, returns a matching object (this object contains information about our match)
search only match to a finding in line with the rules of return, would not have been looking back
Re.match and the difference re.search
re.match: looking for from the beginning of the string, if the string does not conform to begin regular expression, the match fails, the return empty
re.search: match the entire string, if you could not find an empty return
sub Method
The match to all strings are replaced, did not find not operate, return the original string
Re.S has one parameter, which represents a "." Effect extended to the entire string, including "\ n"
Metacharacters
Single-character match
. Matches any one character (except \ n-) [] matches [] characters listed
\ d match numbers, i.e. 0-9 \ D matches non-digital, i.e., not a number
\ s match blank, i.e., space, tab key \ S matches non-blank
\ w matches word characters, namely az, AZ, 0-9, _ \ W matches non-word character
It represents the number of metacharacters
* A character appears zero or infinity times before the match, that is dispensable
+ Matches the previous character appear more than once or unlimited, that is at least 1
? Matches the preceding character appear more than once or zero times, that there is either 1 or none
{m} a character match occurs m times ago
{m,} a character appears at least once before the match m
{m, n} former matches a character appears from m to n times
It represents the boundary metacharacters
^ Matches the beginning of a string
$ End of string match
\ b matches a word boundary
\ B matches non-word boundary
Packet Matching
| Match around any expression
(ab) The characters in parentheses as a packet
Greedy and non-greedy
Regular default greedy pattern are used to match the data, that is, as much as possible in line with the data requirements of the match
In the non-greedy mode, always looking for the shortest match
Add ? Is non-greedy mode