A commonly used pattern matching
\ w matching alphanumeric and underscores \ W matches non-alphanumeric underscores f \ s matches any whitespace character, equivalent to [\ T \ n-\ R & lt \ f] \ S matches any non-null character \ d matches any number \ D Match any non-numeric \ a matches the beginning of the string \ z matches the end of the string, if present wrap, before the end of the string match only newline \ z matches the end of the string \ G match the last position alignment complete \ n matches a newline \ t a matching tab ^ beginning of string matching end $ matched string . matches any character except when the line breaks, re.DOTALL flag is specified, any character can match comprises newline [....] using to represent a group of characters, listed separately: [amk] matches a, m or K [ ^ ...] is not [] characters: [^ ABC] in addition to matching a, b, c character * matches 0 one or more expression + match one or more of the expressions ? match 0 or 1 by the preceding regular expression defined fragments, non-greedy manner {n} n represents an exact match in front {m, m} n to m times to match the regular expression by the preceding definition segment, greedy a| B matches a or b expression in () parentheses matching, also represents a group
Second, the regularization method used
1.match () method
Matching a pattern from a starting position of the string, if not matched, then the starting position, match () returns None
Syntax: re.match (pattern, string, flags = 0)
Results for a matching result.group (), result.span () is eligible to match the length of the string
import re content= "hello 123 4567 World_This is a regex Demo" result = re.match('^hello\s\d\d\d\s\d{4}\s\w{10}.*Demo$',content) print(result) print(result.group()) print(result.span())
Match mode
re.I the match is not case sensitive re.L doing localization Recognition (locale - Aware) match re.M multi-line matching, affecting ^ and $ . re.S the match all characters include newline, including re.U according to parse character Unicode character set. This flag affect \ w, \ W, \ b, and \ B re.X the mark by giving you more flexibility in format so that you will write regular expressions easier to understand
Many times the content is matching wrap problem of existence, this time we need to use pattern matching to match the content re.S wrap
import re content = """hello 123456 world_this my name is zhaofan """ result =re.match('^he.*?(\d+).*?zhaofan$',content,re.S) print(result) print(result.group()) print(result.group(1))
2.search () method
When matching scans the entire string, and then returning the first successful match. re.search (regular expressions, the original string)
3.findall () method
Search the entire string, and then return all matching regular expression content. re.findall (regular expressions, the original string)
4.sub () method
All figures in the text string are removed. the re.sub (regex, replace string, the original string)
5.compile () method
The string that will be translated into a regular expression objects to be reused in subsequent matching .
import re content= """hello 12345 world_this 123 fan """ pattern =re.compile("hello.*fan",re.S) result = re.match(pattern,content) print(result) print(result.group())
6.split () method
The string dividing the given string that matches a regular expression, returns the result list after segmentation .