【python】Python的string和bytes

python3中不会以任何隐式的方式转换string和bytes。

首先这篇文章不错：

http://www.51testing.com/html/63/524463-817888.html

记住这个图：

其中，绿框中的部分python3是不做区分的，也就是说，对python3来说，unicode就是字符，编码和解码都是unicode和别的编码之间的事。

然后，测试代码：

#encode: unicode --> other codec
#decode: other codec --> unicode

#unicode is a middle codec which is the default codec in python

#使用unicode 编码字符串'哈哈'
str_obj = '哈哈'
uni_obj = u'哈哈'
#unicode:\u54c8\u54c8
utf8_obj = uni_obj.encode('UTF-8')
#utf8 code:b'\xe5\x93\x88\xe5\x93\x88'
gbk_obj  = uni_obj.encode('gbk')

#python 2.x 可用
'''
if isinstance(str_obj, unicode):
    print('str_obj是unicode string')
if isinstance(uni_obj, unicode):
    print('uni_obj是unicode string')
'''


print("'哈哈'的数据类型是：" + str(type(str_obj)))
print("u'哈哈'的数据类型是：" + str(type(uni_obj)))
print("encode to utf8 的数据类型是：" + str(type(utf8_obj)))
print("encode to gbk  的数据类型是：" + str(type(gbk_obj)))
print()



#这一句输出的是：哈哈,unicode 编码的字符串
print('print unicode as str:'+uni_obj)
#print('print unicode of uni_obj:'+bytes('\xe5\x93\x88\xe5\x93\x88'))
print()

print()
#这一句输出的是：b'\xe5\x93\x88\xe5\x93\x88' ， 只要不是unicode编码，就直接输出 bytes
print('print utf8 as bytes:'+str(utf8_obj))#str()不会将bytes变为string
print('print utf8 to str(decoded by utf8):'+str(utf8_obj.decode('utf-8')))
print('print utf8 to str(decoded by gbk):'+str(utf8_obj.decode('gbk')))

print()
print('print gbk as bytes:'+str(gbk_obj))
#use utf-8 to decode gbk will cause an error
#print('print gbk to str(decoded by utf8):'+str(gbk_obj.decode('utf-8')))
print('print gbk to str(decoded by utf8):'+'Error！因为utf8编码有三个字节，而gbk只有两个字节')
print('print gbk to str(decoded by gbk):'+str(gbk_obj.decode('gbk')))

【python】Python的string和bytes

猜你喜欢