Basic knowledge collection (5) Coding

One byte (byte) = 8 bits of binary number (bit): 1(byte) = 8(bit)

1 ASCII code (American Standard Code for Information Interchange, American Standard Code for Information Interchange), occupies one byte

2 unicode encoding, ASCII code is enough for the characters of the United States and Europe, but at least 2 bytes are required to process Chinese characters. ASCII code is obviously not enough, so China has customized GB2312 code, Japan, South Korea, etc. have their own code sets, in order to unify the international Standard, unicode encoding is generated, it occupies 2 bytes

3 utf-8 encoding (variable length encoding) If some documents are all English characters, it is too wasteful to use unicode encoding, utf8 can encode a unicode encoding according to the size of the number 1-6 bytes: 1 byte for English characters, 3 bytes for Chinese characters, 4-6 bytes for remote characters

character ASCII code unicode code utf-8
A       01000001     000000000 01000001        01000001
Medium x 01001110 00101101 11100100 10111000 10101101


The computer memory is uniformly encoded with unicode.
If you have a document A and specify utf-8 encoding,
then when the computer reads A, it will first convert it from utf-8 encoding to unicode encoding and load it into memory.
When saving, vice versa

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326944187&siteId=291194637