【python】Python的string和bytes

python3中不会以任何隐式的方式转换string和bytes。

首先这篇文章不错:

http://www.51testing.com/html/63/524463-817888.html

记住这个图:


其中,绿框中的部分python3是不做区分的,也就是说,对python3来说,unicode就是字符,编码和解码都是unicode和别的编码之间的事。

然后,测试代码:

#encode: unicode --> other codec
#decode: other codec --> unicode

#unicode is a middle codec which is the default codec in python

#使用unicode 编码字符串'哈哈'
str_obj = '哈哈'
uni_obj = u'哈哈'
#unicode:\u54c8\u54c8
utf8_obj = uni_obj.encode('UTF-8')
#utf8 code:b'\xe5\x93\x88\xe5\x93\x88'
gbk_obj  = uni_obj.encode('gbk')

#python 2.x 可用
'''
if isinstance(str_obj, unicode):
    print('str_obj是unicode string')
if isinstance(uni_obj, unicode):
    print('uni_obj是unicode string')
'''


print("'哈哈'的数据类型是:" + str(type(str_obj)))
print("u'哈哈'的数据类型是:" + str(type(uni_obj)))
print("encode to utf8 的数据类型是:" + str(type(utf8_obj)))
print("encode to gbk  的数据类型是:" + str(type(gbk_obj)))
print()



#这一句输出的是:哈哈,unicode 编码的字符串
print('print unicode as str:'+uni_obj)
#print('print unicode of uni_obj:'+bytes('\xe5\x93\x88\xe5\x93\x88'))
print()

print()
#这一句输出的是:b'\xe5\x93\x88\xe5\x93\x88' , 只要不是unicode编码,就直接输出 bytes
print('print utf8 as bytes:'+str(utf8_obj))#str()不会将bytes变为string
print('print utf8 to str(decoded by utf8):'+str(utf8_obj.decode('utf-8')))
print('print utf8 to str(decoded by gbk):'+str(utf8_obj.decode('gbk')))

print()
print('print gbk as bytes:'+str(gbk_obj))
#use utf-8 to decode gbk will cause an error
#print('print gbk to str(decoded by utf8):'+str(gbk_obj.decode('utf-8')))
print('print gbk to str(decoded by utf8):'+'Error!因为utf8编码有三个字节,而gbk只有两个字节')
print('print gbk to str(decoded by gbk):'+str(gbk_obj.decode('gbk')))


猜你喜欢

转载自blog.csdn.net/DSbatigol/article/details/12656097