Python regular expression super detailed tutorial (1) suitable for zero-based beginners

Python regular expression super detailed tutorial (1) suitable for zero-based beginners


what is a regular expression

A regular expression (Regular Expression) is a special sequence of characters that can help you easily check whether a string matches a certain pattern. Python has added the re module since version 1.5, which provides Perl-style regular expression patterns. The re module also provides functions that do exactly what these methods do, taking a pattern string as their first argument. This article mainly introduces the knowledge of regular expressions commonly used in Python.

1. Metacharacters

Metacharacters: Special symbols with fixed meanings. Common metacharacters are listed in the table below:

metacharacter meaning
. matches any character except newline
\w Match letters, numbers, underscores
\d match numbers
\W Matches non-alphabet, non-digit, non-underscore
\D match non-digit
[ ] A character group, you can put the characters you want to match in [ ]
[^ ] Matches characters except those in the character group
^ match the beginning of the string
$ matches the end of the string

Two, quantifiers

That is, the number of occurrences of metacharacters before the control.

quantifier meaning
* Repeat zero or more times
+ repeat one or more times
? Repeat zero or one time
{n} repeat n times
{n,} repeat n or more times
{n,m} Repeat n to m times

3. Greedy matching and lazy matching

matching method character representation
lazy matching .*?
greedy matching .*

Personal understanding: Lazy matching is the part from the start character to the first end character, that is, as few and short matches as possible; while greedy matching is from the start character to the last end character, that is, as many as possible , long match.

Four, re.match function

re.match matches a pattern from the beginning of the string (matching from the beginning). If the match is not successful at the beginning, match() returns none.
Function syntax:

re.match(pattern,string,flags=0)
parameter describe
pattern match regular expression
string string to match
flags The flag bit is used to control the matching mode of the regular expression, such as: whether to be case-sensitive, multi-line matching, etc.

If the match is successful, re.match() returns a matching object, otherwise it returns None.
We can use the group(num) or groups() match object function to get the match expression.

match object method describe
group(num=0) The string that matches the entire expression. group() can input multiple group numbers at once, in which case a tuple containing the values ​​corresponding to those groups will be returned.
groups() Returns a tuple containing all group strings, from 1 to the included group number.
import re
print(re.match('www','www.runoob.com').span()) #在起始位置匹配
print(re.match('com','www.runoob.com')) #不在起始位置匹配

Output result:
(0, 3)
None
Parsing: In order for match() to match successfully, the beginning of the string to be matched must be the same as the regular expression, otherwise the match will not succeed.
insert image description here

Five, re.search function

re.search is full-text matching, scans the entire string and returns the first successful match.
Function syntax:

re.search(pattern,string,flags=0)

The meaning of each parameter is the same as that of match().
If the match is successful, returns a matching object, otherwise returns None.
You can also use the group(num) or group() matching object function to get the matching expression.

import re
print(re.search('www','www.runoob.com').span())  
print(re.search('com','www.runoob.com').span())

Output result:
(0, 3)
(11, 14)
insert image description here
Summarize the small difference between match() and search():
match() only matches the beginning of the string, if the beginning of the string is different from the regular expression, the match fails, The function returns None; while search() will scan the full text and match the entire string until it finds the first match, and returns the first match.

Six, re.findall function

insert image description here

Seven, finditer function

insert image description here

Eight, re.compile function

compile() is a preloaded regular expression, which is used to compile the regular expression and generate a Pattern object for use by the match() and search() functions.
The syntax format is:

re.compile(pattern[,flags])

Parameter meaning:

  • pattern: a regular expression in the form of a string
  • flags: optional, indicating the matching mode, the specific parameters are as follows:
re.l ignore case
re.L Represents the special character set \w,\W,\b,\B,\s,\S, depending on the current environment
re.M multiline mode
re.S Any character including newline
re.U Indicates special character sets \w, \W, \b, \B, \d, \D, \s, \S, depending on the Unicode character attribute database
re.X For readability, ignore spaces and comments after #

insert image description here

Through the explanation of the editor, do you have a good understanding of regular expressions? If so, please give the editor a like + favorite! Welcome to leave a message in the comment area or private message me. The new editor needs everyone's encouragement and support!

Guess you like

Origin blog.csdn.net/m0_52423924/article/details/122442430