variable, string encoding

Variable declaration:  

    •  Variable names can only be any combination of letters, numbers, and underscores.
    • The first character of the variable name cannot be a number
    • There is another key value reserved in python

['and', 'as', 'assert', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'exec', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'not', 'or', 'pass', 'print', 'raise', 'return', 'try', 'while', 'with', 'yield']

 

Assignment of variables:

   

   Character Encoding:

      

When the python interpreter loads the code in the .py file, it encodes the content (ascill by default)

 

ASCII (American Standard Code for Information Interchange, American Standard Code for Information Interchange) is a computer coding system based on the Latin alphabet, mainly used to display modern English and other Western European languages, which can only be represented by a maximum of 8 bits (one byte ), that is: 2**8 = 256-1, so the ASCII code can only represent up to 255 symbols.

 

 

       

 

 

About Chinese

 

To handle Chinese characters, programmers designed GB2312 for Simplified Chinese and big5 for Traditional Chinese.

 

GB2312 (1980) contains a total of 7445 characters, including 6763 Chinese characters and 682 other symbols. The inner code range of the Chinese character area is from B0-F7 in the high byte and A1-FE in the low byte. The occupied code points are 72*94=6768. Among them, 5 vacancies are D7FA-D7FE.

 

GB2312 supports too few Chinese characters. The Chinese character extension specification GBK1.0 in 1995 included 21886 symbols, which are divided into Chinese character area and graphic symbol area. The Chinese character area includes 21003 characters. GB18030 in 2000 is the official national standard to replace GBK1.0. The standard includes 27,484 Chinese characters, as well as Tibetan, Mongolian, Uyghur and other major minority languages. The current PC platform must support GB18030, and there is no requirement for embedded products. Therefore, mobile phones and MP3s generally only support GB2312.

 

From ASCII, GB2312, GBK to GB18030, these encoding methods are backward compatible, that is, the same character always has the same encoding in these schemes, and later standards support more characters. Among these encodings, English and Chinese can be handled uniformly. The method to distinguish Chinese encoding is that the highest bit of the high byte is not 0. According to the programmer's name, GB2312, GBK to GB18030 belong to the double-byte character set (DBCS).

 

The default internal code of some Chinese Windows is still GBK, which can be upgraded to GB18030 through the GB18030 upgrade package. However, the characters added by GB18030 relative to GBK are difficult for ordinary people to use. Usually, we still use GBK to refer to the Chinese Windows internal code.

 

 

 

 

 

Obviously, ASCII code cannot represent all kinds of characters and symbols in the world. Therefore, it is necessary to create a new encoding that can represent all characters and symbols, namely: Unicode

 

Unicode (Unicode, Universal Code, Unicode) is a character encoding used on computers. Unicode was created to solve the limitations of the traditional character encoding scheme. It sets a unified and unique binary encoding for each character in each language, and stipulates that although some characters and symbols are represented by at least 16 bits (2 bytes), that is: 2 **16 = 65536,
Note: The minimum size is 2 bytes, maybe more

 

UTF-8 is the compression and optimization of Unicode encoding. It no longer uses at least 2 bytes, but classifies all characters and symbols: the content in ascii code is stored in 1 byte, European characters Save with 2 bytes, East Asian characters with 3 bytes...

 

            

Notes:

          # Single-line comment shortcut key ctrl + ? key

         ''' Multi-line comment'''

   

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324493158&siteId=291194637