python basis for character encoding conversion

python2 

1  # all characters are encoded on python2 need to decode to unicode, unicode to re-encode target encoding from 
2 str_utf8 = " I am me " 
3  Print ( " str_utf8: I am me: " , str_utf8)
 4  # to utf -8 converted to unicode 
. 5 str_utf8_to_unicode = str_utf8.decode ( " UTF-. 8 " )
 . 6  Print (str_utf8_to_unicode)
 . 7  # will be converted to unicode GBK 
. 8 str_utf8_to_unicode_to_gbk = str_utf8_to_unicode.encode ( " GBK " )

Whether encoding print the program can display the normal operation of the terminal and also has a relationship

 

python3

Byte string type conversion

. 1  # ! / Usr / bin / Python the env 
2  # _ * _ Coding: UTF-_ * _. 8 
. 3  # string binary conversion storage only changes, modifications do not involve encoding of each encode / decode all you need to specify the encoding format string corresponding to 
4 str1 = " I am me " 
5  # string into a binary, use encode. Here encoding format needs to be consistent with the original string encoding format or in python3 will perform transcoding operations 
. 6 str1_byte = str1.encode (encoding = " UTF-. 8 " )
 . 7  # convert binary strings use decode, here If you fill in binary encoding format error may result in binary can not be converted to a string, causes the program error 
8 byte_str1 = str1_byte.decode (encoding = " UTF-8 " )
 9  Print (str1, str1_byte, byte_str1)

On python3 character encoding conversion

1  # ! / Usr / bin / env Python 
2  # _ * _ Coding: UTF-8 _ _ * 
3  # default encoding is the unicode python3 is not required to decode this step, header statement - * - coding: gbk - * - only encoding the file itself, the program inside a string variable or unicode, 
4 str2_utf8 = " I am me " 
5  # python3 on str2_utf8 default is unicode encoding (the string itself no direct decode method), corresponding to directly encode coding, while python3 which will convert byte type 
. 6 str2_utf8_to_gbk = str2_utf8.encode (encoding = " gbk " )
 . 7  # print I is my gbk encoding type byte 
. 8  Print (str2_utf8_to_gbk)
 . 9  # print I is my gbk encoding type corresponding to the byte string, by specifying the correct coding type, can be correctly converted to a string 
10  Print(str2_utf8_to_gbk.decode (encoding = " GBK " ))
 11  # prints I am me utf-8 encoded byte type, type conversion byte to encode for their own utf-8 type can be 
12 str2_utf8_to_utf8_byte = str2_utf8.encode (encoding = " . 8-UTF " )
 13 is  Print (str2_utf8_to_utf8_byte)
 14  # me what I gbk type to utf-8, since coded as a unicode conversion requires an intermediary, to decode encoded first unicode gbk, in which to encode utf-8; 
15  # Python in converts to encode a byte type, so its output value is consistent with the print str2_utf8_to_utf8_byte, to convert it to a string for the decode of its own encoding 
16  Print (str2_utf8_to_gbk.decode (encoding = " GBK " ) .encode (encoding = " UTF-. 8" ))
 17  # The binary utf-8 converted to a string 
18 is  Print (str2_utf8_to_gbk.decode (encoding = " GBK " ) .encode (encoding = " utf-8 " ) .decode ( " utf-8 " ))

result:

b '\ xce \ xd2 \ xbe \ xcd \ xca \ xc7 \ xce \ xd2'
I am what I
b '\ XE6 \ X88 \ x91 \ xe5 \ XB0 \ xb1 \ XE6 \ x98 \ XAF \ XE6 \ X88 \ x91'
b '\ xe6 \ x88 \ x91 \ xe5 \ xb0 \ xb1 \ xe6 \ x98 \ xaf \ xe6 \ x88 \ x91'
I'm me

 

Draw focus:

In comparison to decode python3 python except that the original outer converted to unicode encoding function to further increase the function byte into a string; encode except that the coding format into a corresponding unicode further increased outer converting byte string type of function

Guess you like

Origin www.cnblogs.com/flags-blog/p/11823946.html