What is a character set? Unicode character set and ASCII character set

Character set (Character Set) is a collection of multiple characters. There are many types of character sets. Each character set contains different numbers of characters. Common character sets include ASCII character set, GBK character set and Unicode (UTF-8) character set wait. Let's introduce it in detail.

ASCII character set:

ASCII (American Standard Code for Information Interchange, American Standard Code for Information Interchange): includes numbers, English, and symbols. ASCII uses 1 byte to store a character, and a byte is 8 bits, which can represent a total of 128 character information, which is sufficient for representing English and numbers.

GBK:

GBK is China's code table, which contains tens of thousands of Chinese characters and other characters, and is also compatible with ASCII encoding. A Chinese character in GBK encoding is generally stored in the form of two bytes. After UTF-8 encoding, a Chinese is generally stored in the form of three bytes, and it must also be compatible with the ASCII encoding table. All technicians should use UTF-8 character set encoding.

Unicode character set:

Unicode, also known as Unicode. It is an industry standard in the field of computer science. UTF-8 is a common encoding method of Unicode. The character set used in character decoding must be consistent with the character set used in encoding, otherwise garbled characters will appear.

For example, the process of storing and displaying Chinese characters is analyzed as follows:

Note: English and numbers will not be garbled in any country's encoding.

You can choose the constructor of the String class for encoding and decoding using a program. The specific method is as follows:

String encoding

String decoding

1691476126260_decoding.png

Guess you like

Origin blog.csdn.net/Blue92120/article/details/132445003