Cornerstone Qinnengbuzhuo labyrinth tour - the seventh day (Python character encoding and three kinds of strings)

First, the character encoding

1. The three core computer hardware

    CPU: rendering the data to the user
    memory: temporary access to data, disappeared off
    the hard drive: permanently store data, as well as power failure

2. What is the character encoding?

    Humans can recognize an advanced character identifier, the computer can only recognize 0,1, to complete the exchange of information between man and machine needs a certain medium, transformed two kinds of identifiers (duiying in two types of identifiers)

    Structure formation is called the correspondence relationship: coding table

History of the development of the coding table

(ASCII) 1.ascii: letters, numbers, symbols and the computer identifier correspondence relation 01
Consideration: How to 01 characters 128 full identification
binary: 1111 1111 => 255 => 1bytes (1 byte) => 8 binary bits

2. China: Chinese Character correspondence between the identifier and the computer 01: gb2312 => GBK (common) => GB18030
Japan: Shift_JIS
Korea: Euc_kr

Distortion: coding table using the memory and reading data inconsistency

    Software to open files for reading data process:

  1. Open Software
  2. The computer sends a command to open a file, the file to open
  3. Rendering read data to the user

    py python interpreter open file

  1. Open software (python interpreter)
  2. The computer sends a command to open a file, the file to open
  3. Progressive explain the contents of the open file (access coding inconsistencies, can not be explained properly, collapse), results are presented to the user will be executed

    python reasons explain the failure:
    py2 default AscII to explain the contents of the file, py3 by default UTF-8 to explain the contents of the file
    header: # encoding: GBK
    function: Tell py interpreter interpreted in accordance with what encoding the file contents

Coding table of correspondence between nations can be done to create a character with 01 computer identifier

Coding table: unicode table
    py2: ASCII, not by nations coding because py2 born prior to the nations coding
    py3: utf-8, the use of the nations coding to interpret text
thinking: unicode and utf-8 What is the relationship
    unicode: 2 bytes store Chinese characters, with letters 2 bytes of storage, more occupied space, a high reading efficiency
    utf-8: 3-6 bytes of storage with characters, one byte storage letters, less occupied space ( number of bytes, characters), the low efficiency of reading
    summarized: data memory is stored in unicode press, hard disk and CPU utf-8 according to access data. unicode and unicode utf-8 using a coding table, utf-8 is a reflection of unicode way, variable length data access.
    Access variable length advantages: the presence of large amounts of data are in English, so utf-8 space is smaller and faster.

Two, three, string

1. Three examples string

(1) unicode string, default string
s1 = u'abc Hello \ n bad '
(2) byte string
S2 = b'abc123 \ XB7 \ XB7'
(. 3) a literal string: String not internal doing anything (eg: \ n conversion)
S3 = r'abc Hello \ n bad '

2. Encoding and Decoding

s = '123呵呵'
b = bytes(s, encoding='utf-8')
print(b)

b = b'123\xe5\x91\xb5\xe5\x91\xb5'
n_s = str(b, encoding='utf-8')
print(n_s)

or

#将u字符串编码成b字符串 原始的到二进制
u''.encode()
print(u'你好'.encode('utf-8'))

#将b字符串解码成u字符串 二进制到原始的
b''.decode()

Guess you like

Origin blog.csdn.net/weixin_43860025/article/details/88830881