Detailed explanation of Python3 regular expressions (2)

Previous: Detailed explanation of Python3 regular expressions (1)

This article is translated from: https://docs.python.org/3.4/howto/regex.html

The blogger has made some comments and modifications to this ^_^

Use regular expressions

Now let's start writing some simple regular expressions. Python provides an interface to the regular expression engine through the re module, and allows you to compile regular expressions into pattern objects and use them for matching.

Remarks: The re module is written in C language, so the efficiency is much higher than that of ordinary string methods; compiling regular expressions is also to further improve efficiency; we will often refer to "patterns" later, Refers to the pattern object into which the regular expression is compiled.

compile regular expression

Regular expressions are compiled into a pattern object, which has various methods for you to manipulate strings, such as finding pattern matches or performing string replacements.

re.compile() can also accept the flags parameter, which is used to enable various special functions and syntax changes, we will introduce them one by one later.

Now let's look at a simple example:

The regular expression is passed to re.compile() as a string argument. Since regular expressions are not a core part of Python, there is no special syntax support for them, so regular expressions can only be represented as strings. Some applications don't need to use regular expressions at all, so friends in the Python community don't think it's necessary to incorporate them into the core of Python. Instead, the re module is simply included in Python as a C extension module, just like the socket module and the zlib module.

Using strings to represent regular expressions maintains the usual style of Python simplicity, but it also has some negative effects, which we will discuss below.

troublesome backslashes

As we mentioned in the previous article, regular expressions use the '\' character to make some ordinary characters have special capabilities (such as \d to match any decimal number), or to deprive some special characters of the ability (such as \[ means matches the opening parenthesis '[' ). This conflicts with characters in Python strings that perform the same function.

Comment: It's a mouthful, and you'll understand by looking at the example~

As it is now, you need to use regular expressions to match the string '\section' in LaTeX files. Because the backslash is a special character that needs to be matched, you need to add an extra backslash in front of it to deprive it of its special function. So we'll write the regular expression characters as '\\section'.

But don't forget that Python also uses backslashes for special meanings in strings. So if we want to pass '\\section' to re.compile() in full, we need to add two backslashes again...

match characters matching stage
\section String to match
\\section Regular expressions use '\\' to match the character '\'
\\\\section Unfortunately, Python strings also use '\\' for the character '\'

In short, in order to match the backslash character, we need to use four backslashes in the string. So, using backslashes frequently in regular expressions can cause backslash storms, which in turn can make your strings extremely difficult to understand.

The solution is to use Python's raw strings to represent regular expressions (that is, add r to the front of the string, remember... ):

regular string raw string
"away*" r"ab*"
"\\\\section" r"\\section"
"\\w+\\s+\\1" r"\w+\s+\1"

Note: It is strongly recommended to use raw strings to express regular expressions.

achieve matching

When you compile the regular expression, you get a pattern object. So what can you do with it? Schema objects have many methods and properties, we list the most important ones below:

method Function
match() Determines whether a regular expression matches a string from the beginning
search() Traverse the string to find the first position where the regular expression matches
findall() Traverse the string, find all the places where the regular expression matches, and return it as a list
splitter() Traverse the string, find all the places where the regular expression matches, and return it as an iterator

If no match is found, match() and search() will return None; if the match is successful, a match object will be returned , containing all matching information: such as where to start, where to end, matching substrings, etc.

Next we explain step by step:

Now, you can try using the regular expression [az]+ to match various strings.

E.g:

Because + means match one or more times, the empty string cannot be matched. Therefore, match() returns None.

Let's try another string that can match:

In this example, match() returns a match object, which we store in the variable m for later use.

Next, let's take a look at what information is in the matching object. The match object contains many methods and properties, the following are the most important:

method Function
group() returns the matched string
start() Returns the starting position of the match
end() Returns the end position of the match
span() Returns a tuple representing the matching position (start, end)

Everyone see:

Since match() only checks if the regular expression matches at the beginning of the string, start() always returns 0.

However, the search() method is different:

In practice, the most common method is to store the match object in a local variable and check if its return value is None.

The form is usually as follows:

p = re.compile(...)
m = p.match('String goes here')
if m:
    print('Match found:', m.group())
else:
    print('No match')

There are two methods to return all matching results, one is findall() and the other is finditer().

findall() returns a list:

findall() needs to create a list before returning, while finditer() returns the matched objects as an iterator:

Note: If the list is large, it is much more efficient to return an iterator.

(End of this article)

Next: Detailed explanation of Python3 regular expressions (3)

If you like this article, please give me encouragement through the "comment" below ^_^

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324901080&siteId=291194637
Recommended