Python Notes 3: Strings and Encodings

In order to solve the difficulty of character encoding, Unicode came into being. It unifies all languages ​​into one encoding
, to solve the garbled problem, but the Unicode encoding is usually two bytes, if you write the text basically
If all are in English, using Unicode encoding requires twice as much storage space as ASCII encoding.
And transmission is very uneconomical.

At this time, the "variable-length encoding" UTF-8 encoding appeared again. UTF-8 encoding converts a Unicode character
Encoded into 1-6 bytes according to different number sizes.

Several functions related to strings
Python strings support multiple languages
>>>print('contains Chinese str')
contains str in Chinese

single character encoding
ord() function : get the integer representation of a character
chr() function : convert the encoding to the corresponding character

>>> ord('A')
65
>>> ord('中')
20013
>>> chr(66)
'B'
>>> chr(25991)
'Arts'

If you know the integer encoding of the character, you can also write str in hexadecimal like this:
>>> '\u4e2d\u6587'
'Chinese'

Since Python's string type is str, it is represented by Unicode in memory, and one character
corresponds to several bytes. If you want to transmit it over the network, or save it to disk, you need to convert
str into bytes in bytes.
Python uses b-prefixed single and double quotes for data of type bytes:
x=b'abc'

bytes Each character occupies only one byte.

The str expressed in Unicode can be encoded into the specified bytes through the encode()
 
method, for example: >>> 'ABC'.encode('ascii')
b'ABC'
>>> 'Chinese'.encode('utf-8')
b '\ xe4 \ xb8 \ limit \ xe6 \ x96 \ x87'
>>> 'Chinese'.encode('ascii')
Traceback (most recent call last):  
    File "<stdin>", line 1, in <module> 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

To convert bytes to str, you need to use the decode() method:
>>> b'ABC'.decode('ascii')
'ABC'
>>> b'\xe4\xb8\xad\xe6\x96\x87'.decode('utf-8')
'Chinese'

len() function : Calculate the number of characters in str, the number of bytes in bytes
>>> len('ABC')
3

>>> len(b'ABC')
3
>>> len(b'\xe4\xb8\xad\xe6\x96\x87')
6
>>> len('中文'.encode('utf-8'))
6 (a Chinese usually takes up three bytes after UTF-8 encoding)

format
>>>'hi, %s, you have $%d.' % ('donglei', 100000)
'hi, donglei, you have $100000.'

%d integer %f float %s string %x hex integer
Not sure what to use %s, it will convert any data type to string:
>>> 'Age: %s. Gender: %s' % (25, True)
'Age: 25. Gender: True'
If % appears, use %% to represent a %:
>>> 'growth rate: %d %%' % 7
'growth rate: 7 %'

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324682593&siteId=291194637