Python regular, re module

  1. Regular concept
  2. findall match search method
  3. The use and role of metacharacters

Regular Expressions concept

A regular expression is a logical operation equation of the string, the string is a kind of filtering

It may determine whether the format match a given

Information may be extracted from a string format specified

 

 

 

re module

findall method

Find all the regular expression matched substring in the string and returns a list, if no match is found, an empty list is returned

 

match method

From the start position of the string matching

The match is successful, returns a matching object (this object contains information about our match), if not the starting position of the match is successful, match () returns an empty

It can only be matched to a group () to extract the contents of the matched span () to match the extracted character index

 

search method

Scanning the entire string, the match is successful, returns a matching object (this object contains information about our match)

search only match to a finding in line with the rules of return, would not have been looking back

 

Re.match and the difference re.search

re.match: looking for from the beginning of the string, if the string does not conform to begin regular expression, the match fails, the return empty

re.search: match the entire string, if you could not find an empty return

 

sub Method

The match to all strings are replaced, did not find not operate, return the original string

 

Re.S has one parameter, which represents a "." Effect extended to the entire string, including "\ n"

 

 

 

Metacharacters

Single-character match 

Matches any one character (except \ n-)           [] matches [] characters listed

\ d match numbers, i.e. 0-9              \ D matches non-digital, i.e., not a number

\ s match blank, i.e., space, tab key          \ S matches non-blank

\ w matches word characters, namely az, AZ, 0-9, _    \ W matches non-word character

 

It represents the number of metacharacters

*                A character appears zero or infinity times before the match, that is dispensable  

             Matches the previous character appear more than once or unlimited, that is at least 1  

?               Matches the preceding character appear more than once or zero times, that there is either 1 or none

{m}              a character match occurs m times ago

{m,}             a character appears at least once before the match m

{m, n}           former matches a character appears from m to n times

 

It represents the boundary metacharacters

^ Matches the beginning of a string

$ End of string match

\ b matches a word boundary

\ B matches non-word boundary

 

Packet Matching

|      Match around any expression

(ab)   The characters in parentheses as a packet

 

 

 

Greedy and non-greedy

Regular default greedy pattern are used to match the data, that is, as much as possible in line with the data requirements of the match

In the non-greedy mode, always looking for the shortest match

Add  ?   Is non-greedy mode

 

Guess you like

Origin www.cnblogs.com/jiyu-hlzy/p/11772788.html