Please don't use re.compile in Python anymore

This article mainly introduces you to the relevant information about why you should not use re.compile in Python. The sample code introduced in the article is very detailed. It has certain reference learning value for everyone to learn or use python. Friends who need it, come here Let's study together.

If you search for Python regular expressions on the Internet, you will see a lot of junk articles that write code like this:

import re
 
pattern = re.compile('正则表达式')
text = '一段字符串'
result = pattern.findall(text)

The authors of these articles may have been influenced by bad habits in other languages, or they may have been misled by other junk articles, and used them without thinking.

In Python, there is really no need to use re.compile!

To prove this, let's look at the Python source code.

Enter in PyCharm:

import re
 
re.search

Then Windows users hold down the Ctrl key on the keyboard and click search with the left mouse button. Mac users hold down the Command key on the keyboard and click search with the left mouse button. PyCharm will automatically jump to the Python re module. Here, you will see our commonly used regular expression methods, whether it is findall or search or sub or match, all are written like this:

_compile(pattern, flag).对应的方法(string)

E.g:

def findall(pattern, string, flags=0):
 """Return a list of all non-overlapping matches in the string.
 
 If one or more capturing groups are present in the pattern, return
 a list of groups; this will be a list of tuples if the pattern
 has more than one group.
 
 Empty matches are included in the result."""
 return _compile(pattern, flags).findall(string)

Then we look at compile:



def compile(pattern, flags=0):
 "Compile a regular expression pattern, returning a Pattern object."
 return _compile(pattern, flags)
 

Do you see the problem?

The regular expression methods we commonly use all have their own compile!

There is no need to re.compile first and then call the regular expression method.

At this point, someone may refute:

If I have a million strings and use a certain regular expression to match, then I can write the code like this:

texts = [包含一百万个字符串的列表]
pattern = re.compile('正则表达式')
for text in texts:
 pattern.search(text)

At this time, re.compile is only executed once, and if you write the code like this:


2
3
 
texts = [包含一百万个字符串的列表]
for text in texts:
 re.search('正则表达式', text)
 

This is equivalent to performing 1 million re.compile on the same regular expression at the bottom.

Talk is cheap, show me the code.

Let's look at the source code, the regular expression re.compile calls _compile, let's look at the source code of _compile, as shown in the following figure:

The code in the red box shows that _compile comes with its own cache. It will automatically store up to 512 keys composed of type (pattern), pattern, flags). As long as it is the same regular expression and the same flag, when _compile is called twice, the cache will be read directly the second time.

To sum up, please stop manually calling re.compile. This is a bad habit brought over from other languages.

to sum up

The above is the entire content of this article. I hope that the content of this article has a certain reference learning value for everyone's study or work.

I am a python development engineer, and I have compiled a set of the latest python system learning tutorials, including basic python scripts to web development, crawlers, data analysis, data visualization, machine learning, and interview books. Those who want these materials can pay attention to the editor, add Q skirt 851211580 to pick up Python learning materials and learning videos, and online guidance from the Great God!

Guess you like

Origin blog.csdn.net/pyjishu/article/details/105413660