python3 byte string and the text string

python string of small series has been troubled by a major problem, we believe that the big guys have experienced the fear of being dominated by a variety of bar coding. But that's okay, I believe you read this article, you will suddenly see the light string python!
Link Code: https://github.com/princewen/professional-python3
a string type
to python3:
Python language in two different strings, for storing text, for storing a raw bytes.
Unicode text strings using the internal memory, storing the original byte and byte string display ASCII.

python3, a text string type is named type str, byte string type is named bytes.
Normally, a string of instances to obtain a str example, if you want to give a bytes instance, be added before the text character b.

text_str = 'The quick brown fox jumped over the lazy dogs'
print (type(text_str)) #output : <class 'str'>


byte_str = b'The quick brown fox jumped over the lazy dogs'
print (type(byte_str)) #output : <class 'bytes'>

python2:

python2 also has two strings, however, python3 the str class name in unicode in python2, however, python3 of bytes in the name of the class in python2 str class.
This means that the python3 str class is a text string, while in the python2 class is a byte string str.
If the prefix string instantiated, str is a class (here byte string !!!) returns, if you want to give a text string, necessary character string preceded u.


byte_str = 'The quick brown fox jumped over the lazy dogs'
#output : <type 'str'>
print type(byte_str)

text_str = u'The quick brown fox jumped over the lazy dogs'
#output : <type 'unicode'>
print type(text_str)

Second, the string conversion
python3:

Type conversion may be performed between str and bytes, str encode class contains a method for using a particular encoding to convert it to a bytes. Similarly thereto, comprising a decode method based bytes, encoded as a single receiving the necessary parameters, and returns a str. Another thing to note is that, in the never tries to python3 implicitly in a transition between a str bytes and requires explicit use str.encode or bytes.decode method.

#output :  b'The quick brown fox jumped over the lazy dogs'
print (text_str.encode('utf-8'))

#output : The quick brown fox jumped over the lazy dogs
print (byte_str.decode('utf-8'))

#output : False
print ('foo' == b'foo')

#Output : KeyError: b'foo'
d={'foo':'bar'}
print (d[b'foo'])

#Output : TypeError: Can't convert 'bytes' object to str implicitly
print ('foo'+b'bar')

#Output : TypeError: %b requires bytes, or an object that implements __bytes__, not 'str'
print (b'foo %s' % 'bar')

#output : bar b'foo'
print ('bar %s' % b'foo')

python2:

The difference is that the python3, python2 attempts Implicit conversion between text strings and byte string.
The working mechanism is that if the string interpreter encounters a mix of different types of operation, an explanation will first byte string into a text string, then the text string is operated.
Interpreter uses to convert the byte string in the text string decoding process implicit, python2 default encoding is almost always ASCII.
We can use the default view sys.getdefaultencoding encoding method.

#output :  foobar
print 'foo'+u'bar'

#output : ascii
print sys.getdefaultencoding()

#output : False
print 'foo'==u'bar'

#Output : bar
d = {u'foo':'bar'}
print d['foo']

python2, call the encode method can be any type of string into a string of bytes, or decode the string to convert any type of text strings
in actual use, it is easy to confuse people and lead to disaster, consider the following examples:
as shown below, following the given code, after the first encode, according to the string has been converted to a utf-8 format string of bytes, since there is an encode process, first there will be an implicit decoding process, decoding the first byte string is a text string,
here will use the default implicit conversion mode, i.e. getgetdefaultencoding () get in the way, where coded as ASCII, so the following statement is equivalent to:

text_str.encode('utf-8').decode('ascii').encode('utf-8')
text_str = u'\u03b1 is for alpha'

# Traceback (most recent call last):
#   File "/Users/shixiaowen/python3/python高级编程/字符串与unicode/python2字符串.py", line 48, in <module>
#     print text_str.encode('utf-8').encode('utf-8')
# UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 0: ordinal not in range(128)

print text_str.encode('utf-8').encode('utf-8')

Third, read the file
python3:

Byte file is always stored, therefore, in order to use the text data file read, you must first be decoded into a text string.
python3, the text under normal circumstances you will automatically decode, so open or read files will get a text string.
Decoding method used depends on the system, or at most linux mac os systems, the preferred code is utf-8, but not necessarily windows.
You may be used locale.getpreferredencoding () method to obtain the system default decoding method.

# <class 'str'>
# Python中有两种不同的字符串数据,文本字符串与字节字符串,两种字符串之间可以互相转换
# 本章将会学到文本字符串和字节字符串的区别,以及这两类字符串在python2和python3中的区别。
with open('字符串与unicode','r') as f:
    text_str=f.read()
    print (type(text_str))
    print (text_str)

import locale
#output : UTF-8
print (locale.getpreferredencoding())

Encoding declaration can display files, using the open method of encoding key reading file

# <class 'str'>
# Python中有两种不同的字符串数据,文本字符串与字节字符串,两种字符串之间可以互相转换
# 本章将会学到文本字符串和字节字符串的区别,以及这两类字符串在python2和python3中的区别。
with open('字符串与unicode','r',encoding='utf-8') as f:
    text_str = f.read()
    print(type(text_str))
    print(text_str)

"""
如果你希望以字节字符串的形式读取文件,使用如下的方式
"""

# <class 'bytes'>
# b'Python\xe4\xb8\xad\xe6\x9c\x89\xe4\xb8\xa4\xe7\xa7\x8d\xe4\xb8\x8d\xe5\x90\x8......
with open('字符串与unicode','rb') as f:
    text_str=f.read()
    print (type(text_str))
    print (text_str)

python2:

python2, regardless of the manner in which to open the file, read the method always returns a byte string

# <type 'str'>
# Python中有两种不同的字符串数据,文本字符串与字节字符串,两种字符串之间可以互相转换
# 本章将会学到文本字符串和字节字符串的区别,以及这两类字符串在python2和python3中的区别。
with open('字符串与unicode','r') as f:
    text_str=f.read()
    print (type(text_str))
    print text_str

 

Guess you like

Origin blog.csdn.net/u012501054/article/details/91543357