python - re module (regular expressions)

Regular online test  http://tool.chinaz.com/regex  

 

\ Escape character

 

A character [abc] matching brackets

A character [ac] matches in ac

A character [a-dm-p] or mp of matching ad

Any matching single character except a newline \ n of

\ W matches a word character alphabet (including az AZ), numbers, underscores

\ W matches characters other than \ w of

\ S matches any whitespace (including newline \ n-, carriage return \ R & lt, tab \ T, vertical tab \ V, formfeed \ F)

\ S matches characters other than \ s the

\ D match numbers (0--9)

\ D matches characters other than \ d of

\ n    newline \ n

\ T match tab

\ b matching words ending ab \ b with ab for the ending of the word [boundary separator can be a space, comma, -, periods]

\ B matches non-word boundary, such as ab \ B does not end with the word ab

^     Start of the string    ^ gh gh string head is

[^ X] x matches any character except non meaning

[^ Abc] matches any character except three letters abc

$     Matching string tail   gh $ gh is the end of the string

ae | b     matches ae or b    [first left and right  ab | abc match ab does not match abc]

() Grouping  

According to a left parenthesis from left to right ordering of the first set of \ 1 and so represents the same as the first set of content, grouped behind the content with the same \ 1,  note that there is an implicit global packet (that is, 0 ), it is the entire regular expression

(? P <name> ......) in addition to the original number, then specify a name, the group behind the content with the same (? P = name)

example  

In findall split packets have priority

{N} n times

{N,} n times or more than n times

{N, m} n -m repeated times

{N}? Minimize repeated n times

{N,}? N times or more less try to repeat n times

{N, m}? Nm repeated twice to minimize repeated

* Denotes repeated 0 times or more times  [as many matches  greedy matches]

*? Repeated zero or more times as little as possible duplicate

+ Repeated once or multiple times      [as many matching  greedy match]

+? Repeated one or more times as little as possible duplicate

? Repeat 0 or 1 times   

Note  ruler?  It is inert match

. *? x    encountered x stop

R behind the characters are ordinary characters

? = Positive pre-investigation [forward conditions match] this position to meet the conditions on the match, does not meet the conditions will not match    

Example \ d {3} - \ d {8} (? = Microsoft) Microsoft not displayed

?! Positive negative pre-investigation [non-positive conditions match] this position to meet the conditions will not match, it does not satisfy the conditions to match    

Example \ d {3} - \ d {8} (?! Microsoft) Microsoft not displayed

? <= Reverse pre-investigation this position to meet the conditions on the match, does not meet the conditions will not match  

Example (<= girl?) \ D {3} - \ d {3} conditions ahead

|    Or  from left to right match   left once the match is not the right match, so the long put on the left

 

Regular method:

 

 

Flag

re.I ignore case

re.L set of special characters \ w, \ W, \ b, \ B, \ s, \ S depends on the current environment

re.M multi-line mode , it will affect ^ $, each row is a new beginning and the end

re.S namely '' and includes any character, including newline ( '.' does not include a line feed)

re.U set of special characters \ w, \ W, \ b, \ B, \ d, \ D, \ s, \ S depends on the Unicode character properties database defaults

re.X To increase readability, ignore comments and whitespace '#' back   

 

 

compile function

Used to compile regular expressions, generate a regular expression object

Syntax is: P = the re.compile (pattern [, the flags])

Parameter 1 pattern: a string of a regular expression

Parameter 2 flags Alternatively, the flag  indicates the matching pattern, such as ignoring the case, multi-line mode

 

 

 

 

findall

Find all the regular expression matched substring in the string and returns a list, if no match is found, an empty list is returned

A syntax: s = re.findall ( '\ d', 'liming tan23guwu468zhong')

Parameter 1    Regular Expressions

Parameter 2 String

Note: findall a direct function of the expression if (), the default priority match () inside; To ungroup priority within the group added :?

Examples of s = re.findall ( 'www (?:   baidu |. Oldboy) .com', 'jhgfwww.oldboy.comhgffd')

Can be named to the packet (? P <name> \ d +)

 

re.finditer

And findall Similarly, in the string to find all of the regular expression matching sub-strings, and returns them as an iterator

Parameter 1    Regular Expressions

Parameter 2 String

Parameter 3   Optional  flag

The return value is an iterator objects, each object needs to group () to get the result

example

s = re.finditer ( '\ d +' adff34lkjhg87nhbvf90 ')

for i in s:

    print(i.group())

 

 

Two syntax: s = p.findall ( '2345lm0987MING654abc87654', 5,18)

p regular target

Parameter 1 String

Parameter 2 starting position

Parameter 3 end position (not included)

Examples of     s = p.findall ( '2345lm0987MING654abc87654', 5,18)

 

re.search method

Scanning the entire first string and returns a successful match

Function Syntax: re.search (pattern, string, flags = 0)

Parameter 1 Regular Expressions

Parameter 2 to match the string

Parameter 3     optional parameters   flag for controlling the regular expression matching method, such as: whether or not case-sensitive, multi-line matching, etc.

 

The match is successful re.search method returns an object matching, otherwise None

We can use the group () function to obtain the matching object matching the expression , if the return is None , group () will complain

group () with a single parameter (group number) , n is the group number of the packet sequence number expression of a () number is 1

 

example

s=re.search(r'[A-Z]+','2345lm098KH7654abc87654')

if s !=None:

    ss=s.group()

    print(ss)

 

re.match function

re.match try to match a pattern from a starting position of the string, if not the starting position of the match is successful, match () returns none

Successful return match a match of the object , we can use the group () function to obtain the matching object matching expression

Will complain if it was None, group () when

example

s=re.match(r'[A-Z]+','B2345lm098KH7654abc87654')

if s !=None:

    ss=s.group()

    print(ss)

 

re.split division

Can be used as delimiter matching, returns a list of the divided

Syntax  s = re.split (pattern, string [ , maxsplit = 0, flags = 0])

Parameter 1 Regular Expressions

Parameter 2 String

Parameter 3   optional   partition number, maxsplit = 1 once separated, the default is 0, the number is not limited

Parameter 4   optional   flags, for controlling the regular expression matching method, such as: whether or not case-sensitive, multi-line matching, etc.

Note: The grouping feature

If (), will be retained () within the group delimiters

example

s=re.split('(\d+)','adjg24kjbvc76lkjh89uytrc94mhb')

Results [ 'adjg', '24' , 'kjbvc', '76', 'lkjh', '89', 'uytrc', '94', 'mhb']

 

 

re.sub replacement for string matches

Syntax: S = the re.sub (pattern, the repl, String, COUNT = 0)

Parameter a   regular expression   matching the regular replacement is

Parameter 2   replace the string, it can also be a function of

Parameter 3 original string

Parameter 4   maximum number of replacements, the default 0 represents replace all matches

Returns string replacement

 

s = re.subn ( '\ d', '|', 'liming2lixueqian8 Li')

Parameter a   regular expression   matching the regular replacement is

Parameter 2   replace the string, it can also be a function of

Parameter 3 original string

Parameter 4   maximum number of replacements, the default 0 represents replace all matches

Tuple returns a  tuple of the first replacement string is   the second term is the number to be replaced

 

 

 

 

 

 

 

Long emperor arrogance

Guess you like

Origin www.cnblogs.com/liming19680104/p/11323115.html