Little common sense of character sets

character set

Question 1: Garbled characters

1. Reasons
There are only 0 and 1 in the computer world, but
there are many characters in the human world.

When humans first used computers, the hardware machine language was 00000011100001111.
Later, for the convenience of people who use computers, mnemonics, etc. were introduced, which can represent more abundant characters and other information
. 65 decimal and then converted to binary.
The earliest ASCII code can only represent 128 characters.
Later, computers are not only in the United States, but will spread to other countries.
Each country has expanded 128 on the basis of ASCII codes, becoming 256, but the later expanded 128 are not universal.
Even when computers were introduced to Asia, multi-byte encodings began to appear,
such as: GB2312 in China, later GBK, Big5 in Taiwan and other regions, etc.
More and more character encodings appeared, but there were problems in document exchange between countries.

Computers hope to be able to communicate without barriers around the world, and began to introduce the Universal Code, Unicode code table,
this code table can use its rules to represent all the characters in the world, and each character has its own unique code.

The Unicode encoding table only says what numbers are used to represent characters, but the range of this number is very large, there are
1 byte, 2 bytes, 3 bytes, 4 bytes, etc., in network transmission A problem arises.
For example, when I receive 4 bytes of data, should it be represented as 4 characters or 1 character?
Later, in order to solve the problem of transmission, encoding methods such as UTF-8 appeared, and it was stipulated,
how many characters ? A byte is a character.

Different encoding methods parse different characters for the same set of data

2. How to solve
the unified coding

Modify the character encoding of the current source file:
Method 1: [Encoding] menu – "convert to ANSI (consistent with the current operating system) encoding

If you want the source files created later to be consistent with ANSI encoding, you can do this:
Method: [Settings] menu –> [Preferences] – “[New] –” [ANSI] encoding

おすすめ

転載: blog.csdn.net/qq_42481940/article/details/106950532