The string in python3 is divided into str and bytes

Ambiguous relationship between unicode, utf-8, gbk, gb2312

unicode encoding: unicode encoding assigns a unique number to all characters in the world, which is hexadecimal. For example, the Unicode number of the simplified Chinese character "slag" is 6E23, in python2 it is "\ u6e23", but unicode only defines each The number of characters does not define how to store this number, so there have been encoding formats such as utf-8, gbk, etc. They are all an implementation of unicode, and still use the unique number in unicode, personal The simple understanding of it is to define the storage method of characters on the basis of unicode encoding.

In python3, strings are divided into two types: str and bytes

Str To Bytes 使用 encode(), 编码
Bytes To Str 使用 decode(), 解码

It should be noted here that the bytes type string in python3 is equivalent to the string type str in python2. There is no unicode type string in python3. In fact, the default encoding is involved.The default character encoding of python3 is: utf -8, Python2 default character encoding is: ASCII, ASCII code contains 128 characters, including all English characters, Arabic numerals, punctuation marks, control symbols, etc., but there is no Chinese, Chinese is a hieroglyph, need to use more Bytes are combined to represent each Chinese character, so ASCII cannot satisfy the representation of Chinese, so if Python2 does not reset the character encoding, str type Chinese character strings (which can be specified as unicode type), because the Cpython2 interpreter cannot recognize it. 

As for the relationship and difference between ASCII, UNICODE, UTF-8, I will not elaborate here, you can go to understand it yourself (utf-8 encoding is an implementation of unicode encoding, personal understanding can be regarded as the following relationship : utf-8 <---> unicode <---> byte, in the end, the data transmission is still transmitted one byte by one in binary form)

By default, the type of string in python3 is str. In the web framework, it will automatically convert str to byte and return to the front end.

When you need to convert bytes of a certain encoding format to bytes of another encoding format, you need to decode the original encoding format into str type first, and then use the new encoding format to convert to bytes

For example: If there is a variable my_bt, which is the bytes of the encoding format gbk, it needs to be converted to the encoding format of UTF-8, and the following processing needs to be performed:

my_str = my_bt.decode("gbk")  # 解码

my_bt = my_str.encode("utf-8") # 重新编码

 

Published 150 original articles · praised 149 · 810,000 views

Guess you like

Origin blog.csdn.net/chaishen10000/article/details/103168183