Thoroughly understand: Can char in java represent Chinese?

Can char in java represent Chinese?

can express.

In the C language, char only occupies 1 byte, that is, 8 bits, and can only represent up to 2 to the 8th power of characters, that is, 256 characters. Among them, 0-127 represent ASCII characters, a total of 128 characters in the ASCII table. There are 128 remaining, indicating that Chinese is obviously not enough. So char in C language cannot represent Chinese .

In Java, char is stored in unicode, and the unicode coded character set includes Chinese. Java actually uses UTF-16 encoding internally, so it supports most non-uncommon Chinese characters; using the Unicode character set, a char occupies two bytes, and a Chinese character is also two bytes, so char in Java It can represent a Chinese character.

Two bytes represent Chinese characters, is it enough?

Commonly used Chinese characters are less than 10,000. Only 6763 Chinese characters are included in GB2312, which is almost enough.

What are BMPs?

Basic Multilingual Plane Basic Multilingual Plane. Its code point range is 0x0000-0xFFFF, which contains the most commonly used characters in the world.

What are the character sets? Why does java use unicode?

The character sets are ASCII, OEM, GB2312, GBK, GB18030, ISO-8859-1, Unicode.

  • ASCII can only represent 128 characters
  • OEM can only represent 256 characters, and there are many implementation versions, which are not uniform.
  • ISO-8859-1, also known as Latin-1, is a single-byte character set that can only represent 256 characters. Includes Latin letters, numbers, punctuation marks, and other common characters, as well as some special symbols. ISO 8859-1 is mainly used in languages ​​of Western European countries, such as English, French, German, etc. It is also widely used in web page production and text transmission such as e-mail.
  • GB2312 The earliest version of the Chinese character set, a Chinese character occupies two bytes, that is, 16 bits. Since it needs to be compatible with ASCII, the highest bit of the 2bytes cannot be 0 (otherwise it will conflict with ASCII). In GB2312, there are 6763 Chinese characters and 682 special symbols, which have included all the most commonly used Chinese characters in daily life.
  • GBK is a double-byte character set, compatible with GB2312 and ASCII, and can represent 20,902 Chinese characters, including traditional characters, and 984 Chinese punctuation marks, radicals, etc.
  • GB18030 Compared with GBK, the extra Chinese characters use 4bytes encoding. In the GB18030-2005 edition, it already contains 70,244 Chinese characters, including minority languages. GB18030 supports Unicode.

The Unicode character set covers all characters currently used by humans, and uniformly numbers each character and assigns a unique character code (Code Point). The Unicode character set divides all characters into 17 levels (Plane) according to the frequency of use, and each level has 2^16=65536 character code spaces. Among them, the 0th level BMP basically covers all the characters used in the world today. The other layers are either used to represent some ancient texts, or are reserved for expansion. The Unicode characters we usually use are generally located at the BMP level. At present, there are still a large number of character spaces in the Unicode character set that are not used.

Why does Java choose the unicode character set? Those things about character encoding
iso8859-1 and gbk
understand ASCII, GB2312, GBK, GB18030 encoding

How do you know that java uses the unicode coded character set?

Official website: https://www.oracle.com/technical-resources/articles/javase/supplementary.html
https://stackoverflow.com/questions/2533097/java-unicode-encoding

What is the relationship between Unicode and UTF-16?

Unicode is essentially a set of standards, and UTF-32, UTF-16, and UTF-8 are three different implementations of Unicode.
What is the relationship between Unicode, UTF-32, UTF-16, and UTF-8?

Guess you like

Origin blog.csdn.net/zhangjin1120/article/details/131588357
Recommended