Java character encoding introduction

In the computer, any text is to specify the encoding exists in the development of Java programs in the most common is ISO8859-1, GBK / GB2312, Unicode, UTF coding.

Common Java coding as follows:

	ISO8859-1:属于单字节编码,最多只能表示 0~255 的字符范围。
	
	GBK/GB2312:中文的国标编码,用来表示汉字,属于双字节编码。GBK 可以表示简体中文和繁体中文,而 GB2312 只能表示简体中文。GBK 兼容 GB2312。
	
	Unicode:是一种编码规范,是为解决全球字符通用编码而设计的。UTF-8 和 UTF-16 是这种规范的一种实现,此编码不兼容 ISO8859-1 编码。Java 内部采用此编码。
	
	UTF:UTF 编码兼容了 ISO8859-1 编码,同时也可以用来表示所有的语言字符,不过 UTF 编码是不定长编码,每一个字符的长度为 1~6 个字节不等。一般在中文网页中使用此编码,可以节省空间。

If handled properly coded character, there is garbage problem may occur in the program. For example, the machine is now default encoding is GBK, but with the ISO8859-1 coding in the program, the problem will be garbled characters. Like two people talking, one person said Chinese, the other person speak English, can not communicate in different languages. To avoid distortion, the program code should be consistent with the local default encoding.

Local default encoding can be used to view the System class. System class in Java can obtain information about the system, so the direct use of such default encoding can be found in the system. The method is as follows:

public static Properties getProperty()

View the JVM default encoding, the code is as follows:

public static void main(String[] args) {
    // 获取当前系统编码
    System.out.println("系统默认编码:" + System.getProperty("file.encoding"));
}

Results are as follows:

系统默认编码:GBK

As can be seen, now the default encoding of the operating system is GBK.

Example distortion generation. Now local default encoding is GBK, next to the text encoding conversion by ISO8859-1 encoding. To achieve the encoding conversion may be used String class getBytes (String charset) method, this method can be specified by setting the encoding format of the method are as follows:

public byte[] getBytes(String charset);
示例代码如下: 
public class Test {
    public static void main(String[] args) throws Exception {
        File f = new File("D:" + File.separator + "test.txt");
        // 实例化输出流
        OutputStream out = new FileOutputStream(f);
        // 指定ISO8859-1编码
        byte b[] = "百度搜索引擎!".getBytes("ISO8859-1");
        // 保存转码之后的数据
        out.write(b);
        // 关闭输出流
        out.close();
    }
}

Results are as follows:

Here Insert Picture Description
Because inconsistent coding, so garbled when saved. In the development of Java, the garbage is a relatively common problem, garbled generation there is a reason that is inconsistent coding and encoding the output of the contents of the received content.

Published 457 original articles · won praise 94 · views 10000 +

Guess you like

Origin blog.csdn.net/weixin_45743799/article/details/104709175