A regular expression, also known as a regular expression (Regular Expression), is a string that uses a single string to describe and match a certain syntax rule, and is often used to retrieve and replace text that matches a certain pattern (rule).
In simple terms, regular expressions use: strings to define rules, and use the rules to verify whether the strings match.
For example, to verify whether a string is an eligible email address, you only need to configure regular rules to match any email address.
For example, through regular rules: (^[\w-]+(\.[\w-]+)*@[\w-]+(\.[\w-]+)+$) can match a string Is it a standard mailbox format
But if you don't use regularization, it is very difficult to use if else to judge the string.
Three basic methods of regularization
Python regular expressions, using the re module, and doing regular matching based on the three basic methods in the re module.
They are: match, search, findall three basic methods
re.match (matching rules, matched strings)
Match from the beginning of the matched string. If the match is successful, return a matching object (including matching information), and if the match is unsuccessful, return empty.
re.search(matching rules, matched string)
Search the entire string for a match. From front to back, after finding the first one, it will stop and will not continue backward
The entire string is not found, returns None
re.findall(matching rules, matched strings)
Match the entire string, find all matches
Could not find return empty list: []
metacharacter matching
The most powerful function of regex is the metacharacter matching rule. Single character matching:
character | Function |
. | Match any 1 character (except \n), \. matches the point itself |
[ ] | Match the characters listed in [ ] |
\d | Matches digits, i.e. 0 - 9 |
\D | match non-digit |
\s | Match blanks, ie spaces, tab keys |
\S | match non-blank |
\w | Match word characters, i.e. az, AZ, 0-9, |
\W | matches non-word characters |
Example:
String s = "itheima1@@python2!!666 ##itcast3"
- Find all numbers: re.findall(r'\d', s)
The r mark of the string indicates that the current string is a raw string, that is, the internal escape characters are invalid but ordinary characters
- Find special characters:
re.findall(r‘\W’, s)
- Find all English letters:
re.findall(r’[a-zA-Z]’, s)
[] can be written: [a-zA-Z0-9] The combination of these three ranges or specify a single character such as
[aceDFG135]
character | Function |
* | The character matching the previous rule appears 0 to infinite times |
+ | The character matching the previous rule appears 1 to countless times |
? | 0 or infinite occurrences of characters matching the previous rule |
{m} | m occurrences of the character matching the previous rule |
{m,} | The character matching the previous rule occurs at least m times |
{m,n} | The character matching the previous rule occurs m to n times |
character | Function |
^ | matches the beginning of the string |
$ | matches end of string |
\b | matches a word boundary |
\B | Matches non-word boundaries |
character | Function |
| | Match any left or right expression |
() | Characters in parentheses act as a group |
the case
- Matching account number, which can only be composed of letters and numbers, and the length is limited to 6 to 10 characters
The rule is: ^[0-9a-zA-Z]{6, 10}$
- Match QQ number, requires pure numbers, length 5-11, the first digit is not 0
The rule is: ^[1-9][0-9]{4, 10}&
[1-9] matches the first digit, [0-9] matches the next 4 to 10 digits
- Match email addresses, only qq, 163, and gmail are allowed
The rule is: ^[\w-]+(\.[\w-]+)*@(qq|163|gmail)(\.[\w-]+)+&
- [\w-]+ means az AZ 0-9 _ and - characters appear at least one, maximum unlimited
- (\.[\w-]+)*, which means a combination. The combination with az AZ 0-9 _ - is at least 0 times, and the maximum is unlimited
For matching: ced.efg in [email protected]
- @ means match the @ symbol
- (qq|163|gmail) means only matching these 3 mailbox providers
- (\.[\w-]+)+ means the combination of az AZ 0-9 _ - at least 1 time, maximum unlimited
Used to match .com.cn in [email protected]
Finally, use + to indicate at least once, that is, for example: .com
More can be: .com.cn.eu like this