Character codec

All files are stored --------> byte

Byte (byte) - (code) -> characters (char)

Byte (byte) <- (decoding) - characters (char)

Byte: dealing with machines.

Character: dealing with people.

Garbled essential reason: encoding, decoding inconsistent.

 

ASCII (American Standard Code For Information Interchange ,
    American Standard Code for Information Interchange).
    7bit used to represent a character, 7th power of 2 = 128 characters.

ISO-8859-1 (coding standard Western Europe, which is a subset of the ASCII)
    using 8bit (one byte) to represent a character, 8th power of 2 = 256 characters.

Chinese: gb2312 <gbk <gb18030
    with 16bit (two bytes) to represent a character. 2 ^ 16 = 65536 characters.

unicode Unicode contains characters around the world.
    Represents a character, English original encoded into a single byte from a double-byte, high-byte of just need to fill all 0 16bit can be used (two bytes).
    The advantage is: can represent all of the characters.
    Disadvantages are: larger memory. (Originally a character need only one byte of storage space, now becomes two bytes to store)

UTF-8 (8-bit Unicode Transformation Format) is a variable-length Unicode character for encoding, UTF-8 with a 1-4 byte encoding Unicode characters.
    Features: variable length bytes.
    In general, three bytes represent a Chinese. English is represented by 1 byte.

Guess you like

Origin www.cnblogs.com/chen--biao/p/11329763.html