IO stream of character encoding

In almost all programming languages ​​are all there character encoding issues, basic components of a computer is just 0,1 two numbers, then the numbers in order to enable the two computers can be described more information content so only various bit logical expression, whereas if it is to display text, then the character must be encoded in a computer, it is used in the early coding ASCII code, but this can be described as substantially encoded belongs to the basic character, and later, the Chinese characters will actually find that you need a longer range of coding to be able to describe, so this analysis is a way around the Chinese encoding processing is performed

We will save a program in another way, when using javac and java command to execute the program, the results will complain, this is our coding problems
in the course of just writing Java programs which, in the case of there not pay attention to character coding problem, Windows inside the default command line only supports GBK coding, other coding is not supported, even if the program code is correct, but due to the encoding process is wrong, so the final program still can not be executed
in real terms day among the development process, the more common kinds of encoding types are children
 GBK / GB2312: description Chinese GB coding, which can be described GBK simplified Chinese and Traditional Hugh Hugh Chinese, and Simplified Chinese GB2312 just;
ISO8859-1: international universal coding , any text can be described, but for some graphical text needs to make transcoding;
l UNICODE: a hexadecimal encoding, you can describe a variety of text information in the world, including single or multi-byte bytes, but there is a problem, not all words are so long for coding, for example: Alphabet, there is no use in this way if the bandwidth occupied by encoded meaning (waste our bandwidth)
l · UTF coding: can be simply understood as "ISO8859-1 + Unicode" (combine the advantages), use it when you need the length of hex-encoded hexadecimal, if not in the form of ISO8859-1 is used, this coding is more suitable for network transmission, and common specifications is "UIF8" coding

After as long as the programming code development tools, then we must first of all development tools support encoding to replace all the UTF-8 encoding, and if it is to properly encode the configuration, you need to know the current system What is the default encoding of type.

package com.sicau.demo;
public class CharacterEncoding {

    public static void main(String[] args) {
        System.getProperties().list(System.out);

    }
}

file.separator = file.encoding = UTF-8 (obtained by means IDEA)
the file.encoding = GBK (Get command line tool, windows command line following reasons UTF-8 encoding is that can not be used here)

In fact now clearly defined coding after, can be very good explanation of the problem garbled, garbled essence is that: the encoded data and decode the data is not uniform.

package com.sicau.demo;
import java.io.*;
public class CharacterEncoding {

    public static void main(String[] args) throws IOException {
        // System.getProperties().list(System.out);
        File file = new File("G:" + File.separator + "message.txt");
        OutputStream output = new FileOutputStream(file);

        String message = "这是乱码的测试学习";
        byte[] data = message.getBytes();
        output.write(data);
    }

}

Now the program is able to perform normal, normal display of
this time talking about the string into an array of bytes does not use other encoding, so used to belong to the default encoding, so if the system supports our default encoding, then, naturally, to get the right data

Example: Mandatory our output content into other coding (ie encoded format output content we have developed, we look after output on windows platform is not garbled

package com.sicau.demo;
import java.io.*;
public class CharacterEncoding {

    public static void main(String[] args) throws IOException {
        // System.getProperties().list(System.out);
        File file = new File("G:" + File.separator + "message.txt");
        OutputStream output = new FileOutputStream(file);

        String message = "这是乱码的测试学习";
        byte[] data = message.getBytes("ISO8859-1");
        output.write(data);
    }
}

The current overall operations do not support such a coding information, so the content can not be ultimately obtained the correct encoding process, resulting in the generation of garbage.

Guess you like

Origin www.cnblogs.com/zrcblog/p/12526662.html