1. The number of bytes occupied by Chinese characters
ASCII code:
English letters (case-insensitive) occupy one byte
Chinese characters occupy two bytes
Binary number sequence, as a digital unit in the computer , is generally 8-bit binary number, converted to decimal. The minimum value is 0 and the maximum value is 255. For example, an ASCII code is a byte.
UTF-8 encoding:
English characters equal to one byte
Chinese (including traditional) three bytes
Unicode encoding:
English two bytes
Chinese (including traditional) two bytes
2. Test
public static void main(String[] args) { char c1='早'; System.out.println(c1); char c2='z'; System.out.println(c2); }
early
from
The char type can store a Chinese character, because the encoding used in Java is Unicode (no specific encoding is selected, the number of the character in the character set is used directly), a char type occupies 2 bytes (16bit), so you can put one Chinese
A variable of type char will report an error when storing two or three Chinese characters
3. Encoding conversion
The use of Unicode means that characters have different expressions inside and outside the JVM. Both are Unicode inside the JVM. When this character is transferred from the JVM to the outside (for example, stored in the file system), encoding conversion is required. Therefore, there are byte streams and character streams in Java, and conversion streams that convert between character streams and byte streams, such as InputStreamReader and OutputStreamReader. These two classes are adapter classes between byte streams and character streams. Code conversion task