Acquaintance encoding format

Encoding format

  1. The earliest this code: ASCII code contains only: letters, numbers, special characters

    0000 0001

    0000 0101 ...

    The leftmost bit is 0 is reserved

    8bit == 1byte

    Only represent 128 different characters

  2. GBK (GB): contains ASCII + Chinese

    A letter: 1byte 0000 0001

    A Chinese: 2byte 0000 0001 0100 0001

    2 ^ 16 = 65535

    Up to 65,535 characters represent

  3. Unicode Unicode: all the text in the world to record the password in this

    At first: a byte to represent characters 2byte

    A letter: 0,000,000,100,000,011

    A Chinese: 0,000,000,100,000,000

    Because: Japanese Kanji 9W + 12W +

    Later: a byte character 4byte

    A letter: 0,000,000,100,000,011 0,000,000,100,000,011

    A Chinese character: 0,000,000,100,000,011 0,000,000,100,000,011

    ⬆ waste of space waste of resources

  4. UTF-8: 8bit 1 with a minimum of bytes to represent a character.

    English one byte 0000 0011

    0,000,001,100,000,011 bytes Europe 2

    0000 0011 0,000,001,100,000,011 Chinese: 3 bytes

eg.   '中国12ab' :  GBK : 8byte 

​     '中国12ab' :UTF-8:10byte
8 bit = 1byte 

1024 byte = 1Kb

1024 Kb = 1Mb 

1024 Mb = 1Gb

1024 GB = 1TB

MB to bit :

7.6MB ---> 7.6* 1024 * 1024 * 8= 63753420.8 (bit)

Guess you like

Origin www.cnblogs.com/pandaa/p/12025071.html