History of Coding Development

History of Coding Development

Before we talk about bytespeace str, we need to talk about how coding has evolved. .

In the early days of computer history, English-speaking countries represented by the United States dominated the entire computer industry, and 26 English letters formed various English words, sentences, and articles. Therefore, the earliest character encoding specification is ASCII code, an encoding specification of 8 bits or 1 byte , which can cover the encoding needs of the entire English department.

What is the encoding? Encoding is the representation of a character in a binary. We all know that all things, whether in English, Chinese or symbols, etc., are ultimately stored on disk as 01010101. Inside a computer, reading and storing data boils down to a bit stream of 0s and 1s. The question is, humans can't understand these bit streams, how to make these 010101 readable to humans? So there is the character encoding, which is a translator, somewhere inside the computer, it transparently helps us translate the bit stream into words that humans can directly understand. For ordinary users, there is no need to know what the principle of this process is and how it is performed. But for programmers, it is a problem that must be clarified.

Taking ASCIIencoding as an example, it stipulates that 8 bits of 1 byte represent the encoding of 1 character, that is, "00000000" is so wide, and it can be interpreted one byte at a time. For example: 01000001 represents the capital letter A, and sometimes we "lazy" use the decimal 65 to represent ASCIIthe encoding of A in . 8 bits, which can represent up to 2 to the 8th power (255) characters without repetition.

Later, when computers became popularized, the characters of Chinese, Japanese, Korean and other countries needed to be represented in the computer. The 255 bits of ASCII were far from enough, so the standard organization developed a universal code called UNICODE , which stipulates any character (regardless of which country) is represented by at least 2 bytes, and can be more . Among them, English letters use 2 bytes, while Chinese characters are 3 bytes. Although this code is good and meets everyone's requirements, it is not compatible ASCII, and it also takes up more space and memory. Because, in the computer world, more characters are English letters, which can be represented by 1 byte, but must be 2.

So UTF-8the encoding came into being, which stipulated that the English alphabet series is represented by 1 byte, the Chinese character is represented by 3 bytes, and so on. Therefore, it is compatible ASCIIand can decode earlier documents. UTF-8It was soon widely used.

In the course of the development of coding, China has also created its own coding methods, for example GBK, GB2312, BIG5. They are limited to domestic use and are not recognized abroad. In the GBKencoding, Chinese characters occupy 2 bytes.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326230280&siteId=291194637