All files are stored --------> byte
Byte (byte) - (code) -> characters (char)
Byte (byte) <- (decoding) - characters (char)
Byte: dealing with machines.
Character: dealing with people.
Garbled essential reason: encoding, decoding inconsistent.
ASCII (American Standard Code For Information Interchange ,
American Standard Code for Information Interchange).
7bit used to represent a character, 7th power of 2 = 128 characters.
ISO-8859-1 (coding standard Western Europe, which is a subset of the ASCII)
using 8bit (one byte) to represent a character, 8th power of 2 = 256 characters.
Chinese: gb2312 <gbk <gb18030
with 16bit (two bytes) to represent a character. 2 ^ 16 = 65536 characters.
unicode Unicode contains characters around the world.
Represents a character, English original encoded into a single byte from a double-byte, high-byte of just need to fill all 0 16bit can be used (two bytes).
The advantage is: can represent all of the characters.
Disadvantages are: larger memory. (Originally a character need only one byte of storage space, now becomes two bytes to store)
UTF-8 (8-bit Unicode Transformation Format) is a variable-length Unicode character for encoding, UTF-8 with a 1-4 byte encoding Unicode characters.
Features: variable length bytes.
In general, three bytes represent a Chinese. English is represented by 1 byte.