90% of people don't know that Python already supports Chinese variable names.

Recently, I was flipping through two relatively new Python books, and both made a terribly low-level mistake!

These two books are "Python Programming: From Beginning to Practice" and "The Programming Journey of Father and Son", both of which are bestsellers, and both have a new version in October 2020, both using Python 3.7+ version syntax.

picture

However, in the section on the naming rules for variables, they make the same mistake, that is, they are still using the rhetoric of the Python2 era, and mistakenly think that the naming only supports the combination of "letters, numbers and underscores".

picture

In fact, Python3.x already supports full Unicode encoding, such as supporting the use of Chinese as variable names.

>>> 姓名 ="Python猫"
>>> print(f"我是{姓名},欢迎关注!")
我是Python猫,欢迎关注!

Since I don't have any other samples on hand, I'm not sure how many newer editions are still using the old rules. However, there is a high probability that translation books will have such problems. In addition, some domestic books that are not rigorous may also make mistakes because they borrow outdated materials.

As a result, I am afraid that some students who are new to Python will form a wrong understanding. While this may not be a serious problem, it is a problem that should be avoided and can be easily avoided.

Therefore, I think this topic is worth talking about.

There is a very common concept in programming languages, that is, identifier, usually called name, which is used to identify the names of variables, constants, functions, classes, symbols and other entities.

There are some basic rules that must be considered when defining identifiers:

    What characters can it be made of?
    Is it case sensitive? (ie case sensitive)
    Does it allow some special words? (i.e. keywords/reserved words)

For the first question, most programming languages ​​followed this rule in early versions: identifiers consist of letters, numbers, and underscores, and cannot start with a number. A few programming languages ​​have exceptions and also support the use of special symbols such as $, @, % (eg PHP, Ruby, Perl, etc.).

Earlier versions of Python, pre-3.0 to be exact, followed the naming convention above. Here is the description from the official documentation:
 

identifier ::=  (letter|"_") (letter | digit | "_")*
letter     ::=  lowercase | uppercase
lowercase  ::=  "a"..."z"
uppercase  ::=  "A"..."Z"
digit      ::=  "0"..."9"

 

However, this rule has been broken since version 3.0. The latest official documentation has become this: 

 

With the popularization of the Internet, languages ​​of various countries have entered the context of internationalization, and programming languages ​​have also increased their demands for internationalization with the times.

The Unicode (translated into Unicode, Universal Code) coding standard was released in 1994, and was gradually accepted by mainstream programming languages. So far, at least 73 programming languages ​​support Unicode variable names (data source: https://rosettacode.org/wiki/Unicode_variable_names).

In 2007, when Python was designing the epoch-making version 3.0, the official also considered support for Unicode encoding, so the important "PEP 3131 - Supporting Non-ASCII Identifiers" was born.


 

In fact, in addition to our most concerned Chinese, the Unicode character set also contains a lot of content.

When naming variables, the following usages are feasible (use with caution, if you are hit, this cat is not responsible...):

 

>>> ψ = 1
>>> Δ = 1
>>> ಠ_ಠ = "hello"

All that said, some Python books about variable naming rules are outdated and shouldn't be misled by them!

Python 3, as a modernization/internationalization language, has good support for Unicode encoding. As for whether to use Chinese to name the identifier in the project, that's another question...

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326866896&siteId=291194637