The road data - Python Reptile - Regular Expressions

A commonly used pattern matching

\ w matching alphanumeric and underscores 
\ W matches non-alphanumeric underscores f 
\ s matches any whitespace character, equivalent to [\ T \ n-\ R & lt \ f] 
\ S matches any non-null character 
\ d matches any number 
\ D Match any non-numeric 
\ a matches the beginning of the string 
\ z matches the end of the string, if present wrap, before the end of the string match only newline 
\ z matches the end of the string 
\ G match the last position alignment complete 
\ n matches a newline 
\ t a matching tab
 ^        beginning of string matching 
end $ matched string 
. matches any character except when the line breaks, re.DOTALL flag is specified, any character can match comprises newline 
[....] using to represent a group of characters, listed separately: [amk] matches a, m or K 
[ ^ ...] is not [] characters: [^ ABC] in addition to matching a, b, c character
 *        matches 0 one or more expression
 +        match one or more of the expressions 
? match 0 or 1 by the preceding regular expression defined fragments, non-greedy manner 
{n} n represents an exact match in front
{m, m} n to m times to match the regular expression by the preceding definition segment, greedy 
a| B matches a or b 
expression in () parentheses matching, also represents a group

Second, the regularization method used

1.match () method

Matching a pattern from a starting position of the string, if not matched, then the starting position, match () returns None

Syntax: re.match (pattern, string, flags = 0)

Results for a matching result.group (), result.span () is eligible to match the length of the string

import re

content= "hello 123 4567 World_This is a regex Demo"
result = re.match('^hello\s\d\d\d\s\d{4}\s\w{10}.*Demo$',content)
print(result)
print(result.group())
print(result.span())

Match mode

re.I the match is not case sensitive 
re.L doing localization Recognition (locale - Aware) match 
re.M multi-line matching, affecting ^ and $ 
. re.S the match all characters include newline, including 
re.U according to parse character Unicode character set. This flag affect \ w, \ W, \ b, and \ B 
re.X the mark by giving you more flexibility in format so that you will write regular expressions easier to understand

Many times the content is matching wrap problem of existence, this time we need to use pattern matching to match the content re.S wrap

import re

content = """hello 123456 world_this
my name is zhaofan
"""
result =re.match('^he.*?(\d+).*?zhaofan$',content,re.S)
print(result)
print(result.group())
print(result.group(1))

2.search () method

When matching scans the entire string, and then returning the first successful match. re.search (regular expressions, the original string)

3.findall () method

Search the entire string, and then return all matching regular expression content. re.findall (regular expressions, the original string)

4.sub () method

All figures in the text string are removed. the re.sub (regex, replace string, the original string)

5.compile () method

The string that will be translated into a regular expression objects to be reused in subsequent matching .

import re

content= """hello 12345 world_this
123 fan
"""
pattern =re.compile("hello.*fan",re.S)
result = re.match(pattern,content)
print(result)
print(result.group())

6.split () method

The string dividing the given string that matches a regular expression, returns the result list after segmentation .

 

Guess you like

Origin www.cnblogs.com/Iceredtea/p/11286133.html
Recommended