Python tutorial: the same Python, too great a difference, we must learn the

Last weekend Python2 and Python3 can get driving me mad!

problem found

Friday's test colleague told me how your user name also allows the Chinese ah? At that time my heart wanted, you test the interface certainly has made a mistake, and I use the regular \ w filter parameters, how can go wrong, unless Python regular system is wrong, it is certainly impossible ah. In the strict style, my own first test, no problem to see how I hate to go back. But when I tested, I am ignorant force, the Chinese really are verified, not ah, so I used to filter parameters, test no problem ah? The real wonder! Like a long time, the only difference is now using Python3.

Internet search around and found no article about Python2 and Python3 regular string is in the process of differentiation, are treated equally, I look through it again know the official document, they would understand how it was.

Recurring problem

As we all know, Python has a regular rule \ w, almost all of the online blog article to tell you, this rule matches alphanumeric and underscores, but is not the case:

There Python2 code is as follows:

~|⇒ pythonPython 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwinType "help", "copyright", "credits" or "license" for more information.>>> import re>>> aa = '捕蛇者说'>>> re.match('\w{1,20}', aa)>>> bb = 'abc123ADB'>>> re.match('\w{1,20}', bb)
<_sre.SRE_Match object at 0x1031b0b28>

We can see that in the python2, \ w is unable to match the Chinese. Well, the same code to run Python3 result is what is it?

~|⇒ python3Python 3.7.1 (default, Nov 28 2018, 11:55:14)
[Clang 9.0.0 (clang-900.0.39.2)] on darwinType "help", "copyright", "credits" or "license" for more information.>>> import re>>> aa = '捕蛇者说'>>> re.match('\w{1,20}', aa)
<re.Match object; span=(0, 4), match='捕蛇者说'>>>> bb = 'abc123ADB'>>> re.match('\w{1,20}', bb)
<re.Match object; span=(0, 9), match='abc123ADB'>

But in Python3 in \ w can match Chinese, this is how it happens? To answer this question, we have to return to the official Python documentation to find an answer.

Solve the problem

When we read Python's official documentation, you will find, for the same regular rules \ w, Python2 and Python3 big difference, we take a look at Python2:

When the LOCALE and UNICODE flags are not specified, matches any alphanumeric character and the underscore; this is equivalent to the set [a-zA-Z0-9_]. With LOCALE, it will match the set [0-9_] plus whatever characters are defined as alphanumeric for the current locale. If UNICODE is set, this will match the characters [0-9_] plus whatever is classified as alphanumeric in the Unicode character properties database.

Translate: When not set LOCALE (re.L) and UNICODE (re.U) mark, matching alphanumeric and underscore, if you set LOCALE (re.L) and the matching numbers underscore LOCALE text. If you set the UNICODE (re.U) mark, matching numbers and the underscore character in the Unicode character set.

Then the Python3:

对于 Unicode (str) 样式:
匹配Unicode词语的字符,包含了可以构成词语的绝大部分字符,也包括数字和下划线。如果设置了 ASCII 标志,就只匹配 [a-zA-Z0-9_] 。
对于8位(bytes)样式:
匹配ASCII字符中的数字和字母和下划线,就是 [a-zA-Z0-9_] 。如果设置了 LOCALE 标记,就匹配当前语言区域的数字和字母和下划线。

This, I understand, by default, does not set any flags, Python2 \ w matches the ASCII character set of characters, including alphanumeric characters and underscores, Python3 \ w match figures underscore and Unicode character sets. Therefore, in order to facilitate the migration, if you want to match the characters in the ASCII character set, specified flag is re.A, if you want to match the character in the Unicode character set, designated flag re.U.

to sum up

This, my problem is completely solved, but there are two lessons:

  • Look online tutorials to pay more attention, in particular, the difference tutorial environment and their own environment
  • See more official documents

About Python2 and Python3, there are many differences, when we use must be careful! Not list them here, and welcome to discuss the message.

More Python tutorials will continue to update everyone!

Guess you like

Origin www.cnblogs.com/cherry-tang/p/10968947.html