First, the introduction to regular expressions
Regular expressions official document : https://www.runoob.com/regexp/regexp-tutorial.html
Regular Expressions: also known as regular expressions, regular expressions, regular expressions, regular expressions, conventional notation (English: Regular Expression, the code is often abbreviated as regex, regexp or RE), is a concept in computer science . Regular expression to describe the use of a single string, the string matching a series of syntactic rules. In many text editor, a regular expression is usually to retrieve, replace text that matches a pattern.
According to certain rules that match the expression.
Second, the regular introduction
Regular expressions are a tool for the matching string, or a string used to extract.
1, it is determined whether or not a given string matches the format (format meets user account determination)
2, from the string, extracting information format specified (fetch phone number)
Import Re str1 = ' fijiooe18814726275iufdrrrrdf18814726275fsdssa ' # define a discovery rule # known to need to find the phone number of the p-= ' 18,814,726,275 ' # Search (): Find data from front to back, the first data returned by default found, will not continue Get back to RES = the re.search (P, str1) .group () Print (RES)
Third, metacharacters
It represents a single character
character | Features |
. | Matches any character (except \ n) |
[] | Matching character [] listed in |
\d | Matching numbers, that is, 0-9 |
\D | Matching non-digital, that is not a number |
\s | Matching blank, space, tab key |
\S | Matching non-blank |
\w | Matching word character, ie az, AZ, 0-9, _ |
\W | Matches non-word character |
It represents the number
The relevant format matching multiple characters.
character | Features |
* | A character appears zero or infinity times before the match, that is dispensable |
+ | A character appears once or unlimited times before the match, that is at least 1 |
? | Matches the preceding character appear more than once or zero times, that there is either 1 or none |
{m} | M times a character appears before match |
{m,} | A character at least m appeared before match |
{m,n} | A character appears at least once before the m ~ n match |
Boundary representation
character | Features |
^ | Matches the beginning of string |
$ | End of the string |
\b | Matches a word boundary |
\B | Matching non-word boundary |
Packet Matching
character | Features |
| | About a match in any expression |
(from) | The brackets character as a grouping |
\on one | Num matched reference packet string |
(?P<name>) | Packet surnamed |
(?P=name) | Reference packet matches the alias name string |
Four, re module
-
re.match function
re.match function tries to match from a starting position of the string pattern matching succeeds, the return is a matching object (the object contains information that matches my IM grace, if not the starting position of the match is successful, match () on It will return None.)
-
re.search method
the re.search () scans the entire string and returns a successful match to the first character.
-
Re.match and the difference re.search
re.match matches only the beginning of the string, if the string does not conform to begin regular expression, the match fails, the function returns None; and re.search match the entire string, until a match is found
-
findall method
Found positive in the string expression matched all substrings, and returned as a list, if no match is found. It is returned to an empty list.
-
Note: match and search is a match; and all findall match.
-
sub Method
Replace certain characters in the string, it can be selected to match substrings using regular expressions.
re.sub(pattern,repl,string,count=0)
-
- pattern: a regular expression pattern string sub;
- repl:被替换的字符串(既可以是字符串,也可以是函数)
- string:要被处理的字符串,要替换的字符串
- count:替换的次数
五、贪婪模式
python中数量词默认是贪婪模式,总是尝试匹配尽可能多的字符;非贪婪模式相反,总是尝试匹配尽可能少的字符。
在 *、?、+、{m,}、{m,n}后面加上?,可以使贪婪模式变成非贪婪模式。