python3 unicod, utf-8, gbk encoding and decoding Chinese display problems

python3 character encoding headache.
This is not an introductory gbk, utf-8, unicode represents how the basics of English, Chinese character summary.
There are many online articles similar, there is no need to ascend to learn to understand what is the meaning of each bit indicates.

Objective:
a clear understanding of why different python3 encoding, decoding, windows, linux operating system, whether the characters can be displayed correctly.

Prerequisite:
understand the different encoding two different lengths and coding system to represent characters.
In python3, the mutual conversion between various unicode character code as intermediate code to go through the conversion. gbk converted to unicode, then converted to a unicode from utf-8.

Analysis:
To distinguish the four kinds of concepts of encoding and decoding,

1. When writing code encoded file.
notepad ++ selectively in the menu "coding", you can see the encoding type in the status bar.
pycharm in the menu "file", "editor", "file encoding" can set the default encoding. We can see the encoding type in the status bar.

2.python3 stated code decoding format.
Code # coding = gbk.
Python encoding declaration tells the compiler to encode what format to decode .py file, it does not change the system default encoding and local default encoding,
nor is it used to declare the current code file encoding format, but rather a statement of the current code file decoding method. In other words,
the encoding format code file depends on the editor you use, and how to decode the file depends on the file encoding declaration at the head.
Generally, the format should be consistent with the encoding and decoding format, coding format and encoding declaration i.e. editor should be consistent.

Note: Depending on the type of decoding the code stated, pycharm will automatically change the coding type, make and type of coding and decoding types consistent. This is the advantage of pycharm.

The default encoding format 3.python interpreter.
When .py python compiler reads the file, without encoding declaration, the system default to decode encoded .py file.
() View with import sys, sys.getdefaultencoding

4. Local default encoding
local mean is the operating system, i.e., the local operating system default encoding default encoding.
Obviously, the default encoding of python compiler consistent across different operating systems, the operating system's default encoding varies with the operating system has changed.
, Locale.getdefaultlocale () View with import locale. windows is gbk, linux is utf-8.

Examples. 1:
to python3 a Chinese string (str = 'in') utf-8 encoded, decodes the code declaration GBK, the code is written by gbk (file system write method) In a file f2, the process shown in linux.
analysis:

  1. python3 code file with utf-8 encoded Chinese characters "medium" of b '\ xe4 \ xb8 \ xad'
  2. python3 interpreter using the code reading stated GBK b '\ xe4 \ xb8 \ xad', to give a string distortion (in the form of unicode string to unicode encoded in memory), if the character%.
  3. python3 code with the GBK character encoding% of GBK encoding for b '\ xe4 \ xb8 \ xad' write file f2
  4. linux terminal is opened with uft-8 b '\ xe4 \ xb8 \ xad', normal display

Examples 2:
to python3 a Chinese string (str = 'in') utf-8 encoded, decodes the code declaration GBK, the code is written by uft-8 (file system write method) In a file f2, shown in the linux process.
analysis:

  1. python3 code file with utf-8 encoded Chinese characters "medium" of b '\ xe4 \ xb8 \ xad'
  2. python3 interpreter using the code reading stated GBK b '\ xe4 \ xb8 \ xad', to give a string distortion (in the form of unicode string to unicode encoded in memory), if the character%.
  3. python3 uft-8 code with the encoder, the corresponding% utf-8 character encoding (certainly not b '\ xe4 \ xb8 \ xad' the binary value), write to a file f2
  4. linux open end with a not uft-8 b '\ xe4 \ xb8 \ xad' coding can not be displayed properly.

Reference article:

  1. https://www.cnblogs.com/yuanchenqi/articles/5956943.html

  2. https://mp.weixin.qq.com/s/JxD7LC33zbFD5QBxJ6jMWw

  3. https://blog.csdn.net/qq_33692803/article/details/81321340?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task

Guess you like

Origin blog.51cto.com/jsahz/2480981