Encoding format
The earliest this code: ASCII code contains only: letters, numbers, special characters
0000 0001
0000 0101 ...
The leftmost bit is 0 is reserved
8bit == 1byte
Only represent 128 different characters
GBK (GB): contains ASCII + Chinese
A letter: 1byte 0000 0001
A Chinese: 2byte 0000 0001 0100 0001
2 ^ 16 = 65535
Up to 65,535 characters represent
Unicode Unicode: all the text in the world to record the password in this
At first: a byte to represent characters 2byte
A letter: 0,000,000,100,000,011
A Chinese: 0,000,000,100,000,000
Because: Japanese Kanji 9W + 12W +
Later: a byte character 4byte
A letter: 0,000,000,100,000,011 0,000,000,100,000,011
A Chinese character: 0,000,000,100,000,011 0,000,000,100,000,011
⬆ waste of space waste of resources
UTF-8: 8bit 1 with a minimum of bytes to represent a character.
English one byte 0000 0011
0,000,001,100,000,011 bytes Europe 2
0000 0011 0,000,001,100,000,011 Chinese: 3 bytes
eg. '中国12ab' : GBK : 8byte
'中国12ab' :UTF-8:10byte
8 bit = 1byte
1024 byte = 1Kb
1024 Kb = 1Mb
1024 Mb = 1Gb
1024 GB = 1TB
MB to bit :
7.6MB ---> 7.6* 1024 * 1024 * 8= 63753420.8 (bit)