python character encoding and transcoding

  

Detailed article:

http://www.cnblogs.com/yuanchenqi/articles/5956943.html

http://www.diveintopython3.net/strings.html

Notes:

1. python2 default encoding is ASCII, python3 in default is unicode

2.unicode divided utf-32 (4 bytes), utf-16 (two bytes), utf-8 (representing 1-4 bytes), so utf-16 is now the most commonly unicode version, but kept in the file or utf8, space saving because utf8

3. In the py3 encode, at the same time also the transcoding type string into bytes, decode while also decoding the bytes back string

 

 The figure only applies to py2

#-*-coding:utf-8-*-
__author__ = 'Alex Li'

import sys
print(sys.getdefaultencoding())


msg = "我爱北京天安门"
msg_gb2312 = msg.decode("utf-8").encode("gb2312")
gb2312_to_gbk = msg_gb2312.decode("gbk").encode("gbk")

print(msg)
print(msg_gb2312)
print(gb2312_to_gbk)

in python2

  

#-*-coding:utf-8-*-
__author__ = 'Alex Li'

import sys
print(sys.getdefaultencoding())


msg = "我爱北京天安门"
msg_gb2312 = msg.decode("utf-8").encode("gb2312")
gb2312_to_gbk = msg_gb2312.decode("gbk").encode("gbk")

print(msg)
print(msg_gb2312)
print(gb2312_to_gbk)

in python2

  

Guess you like

Origin www.cnblogs.com/wjcoding/p/10991091.html