Sixth, coding

Encoding: 
ascii:
Early computers are used ascii encoding, and contains all the special characters in English 8 bits to store a letter (one byte) 8 bits = 1 byte (8 bits binary computer
can only, 01010100, eight digital storage)
such as: H ...... 01011011 ... one byte, there are 2 ** 8 = 256 possibilities
Unicode : Unicode, advantages: can recognize all languages at present, only resolved early ascii English can identify the problem, a total of 32, 2 ** 32 possibilities, Cons:, space,
consume large memory

UTF-8 : compress Unicode basis, with 8 placeholders, save memory space ( in Chinese 3 bytes)
as: H ..... - - - - - - - - - -01011011, in front of a blank area can be saved

gbk: recognize Chinese and English (two bytes in Chinese)

" ' "
1. The three core hardware to run the program 
cpu
memory
hard

any order a program operation, required prior to loading the hard disk memory, cpu and memory to perform the fetch
data to the application running in memory must first be generated

2.python py interpreter to run a file (xxx.py) step
1. the python interpreter code read from the hard disk memory
2. the memory read xxx.py ordinary text files
3.python read the file content recognition grammar performed python appropriate action
ps: before an ordinary text editor and python interpreter two steps are the same

. "" "
# character encoding

" ""
character encoding for the text
that needs to be considered here means video files audio files and other documents ? It does not require
the character encoding text files only with relevant


text editor input and output are the two

people enter in operation when the computer is able to read people's character
but the computer only recognizes 010101 such binary data,
character input >>> (character code table) >>> binary digital


character code table is a correspondence relationship between the characters and numbers
a 0
B. 1
a 00
B 01
C. 11
D 10

ASCII code table
is represented by an English-bit binary characters all English characters + symbol at most about 125
0000 0000
1111 1111


GBK
with 2Bytes represent a Chinese character or English character with 1Bytes represent a
0,000,000,000,000,000
1,111,111,111,111,111 can represent up to 65,535 characters

based on the above derivation steps any country to get a computer to support their own language must create a corresponding relationship between the characters and numbers
Japanese shift
Korean fuck



unicode unicode
unity with 2Bytes represent all of the characters
a 0000 000000101010

1. wasting storage space
frequency 2.io decrease, reducing process efficiency (fatal)


when unicode encoding format data stored in memory to the hard disk when the will follow utf-8 encoded
unicode transformation format

will English unicode characters from the original 2Bytes become 1Bytes
will unicode Chinese characters from the original 2Bytes become 3Bytes




present computer
memory is unicode
hard-8 are UTF



(need to know)
unicode two features
1. user input time , no matter what the character input is compatible with all the nations of characters
2. Other country code data read from the hard disk when the memory unicode encoding various other countries have a correspondence relationship

(*****)
data stored by the memory to the hard disk
unicode binary digital format in memory 1. >>> > binary data encoding (encode) >>>>> utf-8 format

data from the hard disk in the hard disk memory to read
binary data in a hard disk 1. utf-8 format >>>>> decoding (decode) >>> >> memory in unicode format of binary data






(******)
to ensure that no garbled
text files to compile what encoding on what coding solution





python2
the py file in accordance with the text file is read into interpreter default ASCII code ( because no prevalent in developing python2 interpreter Unicode)
python3
the py file in accordance with the text file is read into interpreter default utf-8

header
# coding: utf-8
1. because all encoding support English characters, the file head to be able to properly take effect

based software Python interpreter development, as long as the Chinese, in front of all need to add a u
order is speaking python2 (when you do not specify the file header, the default data in ASCII store, if you specify the file header then press Storing data coding format file header)





to python3 binary string is the default encoding format unicode





added:
1.pycharm terminal using a utf-8 format
2.windows terminal uses gbk

Garbled: characters do not display properly inconsistent coding 

bit binary also called 8bit (******)
8bit = 1bytes
1024 Bytes = 1KB
1024KB = 1MB
1024MB = 1GB
1024GB = 1TB
1024TB = 1PB
....

x = 'on' 
RES1 x.encode = ( 'GBK') # unicode coded to the storage and transmission of binary data utf-8
Print (RES1) # B '\ XE4 \ XB8 \ x8a'
# bytes Type Byte you put it as a type of string of binary data to
res2 = res1.decode ( 'gbk') # binary data in the hard utf-8 format decoded into binary data unicode format
print (res2)























































Guess you like

Origin www.cnblogs.com/wukai66/p/11140118.html