Python Advanced Skills Regular Expressions

A regular expression, also known as a regular expression (Regular Expression), is a string that uses a single string to describe and match a certain syntax rule, and is often used to retrieve and replace text that matches a certain pattern (rule).

In simple terms, regular expressions use: strings to define rules, and use the rules to verify whether the strings match.

For example, to verify whether a string is an eligible email address, you only need to configure regular rules to match any email address.

For example, through regular rules: (^[\w-]+(\.[\w-]+)*@[\w-]+(\.[\w-]+)+$) can match a string Is it a standard mailbox format

But if you don't use regularization, it is very difficult to use if else to judge the string.

Three basic methods of regularization

Python regular expressions, using the re module, and doing regular matching based on the three basic methods in the re module.

They are: match, search, findall three basic methods

re.match (matching rules, matched strings)

Match from the beginning of the matched string. If the match is successful, return a matching object (including matching information), and if the match is unsuccessful, return empty.

 

re.search(matching rules, matched string)

Search the entire string for a match. From front to back, after finding the first one, it will stop and will not continue backward

The entire string is not found, returns None

re.findall(matching rules, matched strings)

Match the entire string, find all matches

Could not find return empty list: []

 metacharacter matching

The most powerful function of regex is the metacharacter matching rule. Single character matching:

single character match
character Function
. Match any 1 character (except \n), \. matches the point itself
[ ] Match the characters listed in [ ]
\d Matches digits, i.e. 0 - 9
\D match non-digit
\s Match blanks, ie spaces, tab keys
\S match non-blank
\w Match word characters, i.e. az, AZ, 0-9,
\W matches non-word characters

Example:

String s = "itheima1@@python2!!666 ##itcast3"

  • Find all numbers: re.findall(r'\d', s)

The r mark of the string indicates that the current string is a raw string, that is, the internal escape characters are invalid but ordinary characters

  • Find special characters:

re.findall(r‘\W’, s)

  • Find all English letters:

re.findall(r’[a-zA-Z]’, s)

[] can be written: [a-zA-Z0-9] The combination of these three ranges or specify a single character such as

[aceDFG135]

Quantity match
character Function
* The character matching the previous rule appears 0 to infinite times
+ The character matching the previous rule appears 1 to countless times
0 or infinite occurrences of characters matching the previous rule
{m} m occurrences of the character matching the previous rule
{m,} The character matching the previous rule occurs at least m times
{m,n} The character matching the previous rule occurs m to n times
boundary match
character Function
^ matches the beginning of the string
$ matches end of string
\b matches a word boundary
\B Matches non-word boundaries
group matching
character Function
| Match any left or right expression
() Characters in parentheses act as a group

the case

  • Matching account number, which can only be composed of letters and numbers, and the length is limited to 6 to 10 characters

The rule is: ^[0-9a-zA-Z]{6, 10}$

  • Match QQ number, requires pure numbers, length 5-11, the first digit is not 0

The rule is: ^[1-9][0-9]{4, 10}&

[1-9] matches the first digit, [0-9] matches the next 4 to 10 digits

  • Match email addresses, only qq, 163, and gmail are allowed

The rule is: ^[\w-]+(\.[\w-]+)*@(qq|163|gmail)(\.[\w-]+)+&

  • [\w-]+ means az AZ 0-9 _ and - characters appear at least one, maximum unlimited
  • (\.[\w-]+)*, which means a combination. The combination with az AZ 0-9 _ - is at least 0 times, and the maximum is unlimited

For matching: ced.efg in [email protected]

  • @ means match the @ symbol
  • (qq|163|gmail) means only matching these 3 mailbox providers
  • (\.[\w-]+)+ means the combination of az AZ 0-9 _ - at least 1 time, maximum unlimited

Used to match .com.cn in [email protected]

Finally, use + to indicate at least once, that is, for example: .com

More can be: .com.cn.eu like this

Guess you like

Origin blog.csdn.net/qq1226546902/article/details/132061133
Recommended