Some questions about coding

ASCII

(American Standard Code for Information Interchange) encoded Chinese characters can not be represented by

GB2312

ASCII is the Chinese GB2312 expansion.

GBK

Expand on the basis of GB2312, an increase of nearly 20,000 new characters (including traditional Chinese characters) and symbols.

GB18030

GBK spread foundation on, added a word thousands of new minorities.

UNICODE

ISO (International Standards Organization who) formulation. Uniform encoding all countries. Is a character set, are not encoded.

UTF-8

Unicode transmission problems on the Internet, each 8-bit data transmission, is an implementation of unicode.
Designed for the transmission of coding, coding and without borders, so you can display all the characters on the world of culture.

prompt:

utf-8 is not supported under windows cmd window, you want to display Chinese gbk or must be converted to unicode.
S input and directly in the Python idle cmd = "Chinese" will gbk coding.
The Python idle in support of these three codes.
Chinese are due to appear garbled coding inconsistencies caused storage is to use utf-8, when printed with gbk will be garbled, and not to ensure that all garbage as much as possible to maintain unity, it is recommended to use all unicode.

Setting coding

import sys
reload(sys) 
sys.setdefaultencoding('utf-8')

Not directly translate between different codes, first converted to unicode.
raw_input prompt string can only gbk coding

chardet can see the string encoding format.

>>> import chardet
>>> chardet.detect('abc123')
{'confidence': 1.0, 'encoding': 'ascii'}
>>> chardet.detect('中国')
{'confidence': 0.7525, 'encoding': 'utf-8'}

Released seven original articles · won praise 0 · Views 1137

Guess you like

Origin blog.csdn.net/weixin_43199103/article/details/89575420