String's getBytes() method and new String()

In Java , String's getBytes() method is to get a byte array in the default encoding format of the operating system . This means that under different operating systems, the returned things are different!

The String.getBytes(Stringdecode) method will return the byte array representation of a string under the encoding according to the specified decode encoding, such as:
byte[] b_gbk = "中".getBytes("GBK");
byte[] b_utf8 = "中".getBytes("UTF-8");
byte[] b_iso88591 = "中".getBytes("ISO8859-1");
will return the Chinese character "中" in GBK, UTF-8 and ISO8859-1 encoding respectively The byte array representation, at this time

The length of b_gbk is 2,

The length of b_utf8 is 3,

b_iso88591 has a length of 1.

In contrast to getBytes, the word "中" can be restored by means of new String(byte[], decode),

This new String(byte[],decode) actually uses the specified encoding decode to parse byte[] into a string.
String s_gbk = new String(b_gbk,"GBK");
String s_utf8 = new String(b_utf8,"UTF -8");
String s_iso88591 = new String(b_iso88591,"ISO8859-1");
By outputting s_gbk, s_utf8 and s_iso88591, you will find that both s_gbk and s_utf8 are "in", and only s_iso88591 is an unrecognized character ( Can be understood as garbled characters), why can't restore the word "中" after using ISO8859-1 encoding and recombining? The reason is very simple, because the encoding table of ISO8859-1 encoding does not contain Chinese characters at all, of course, it is impossible to get the correct "中" word in ISO8859-1 through "中".getBytes("ISO8859-1"); The encoded value of , so it is impossible to restore it through newString().
Therefore, when obtaining byte[] through the String.getBytes(Stringdecode) method, it must be confirmed that the code value represented by String does exist in the decoded encoding table, so that the obtained byte[] array can be restored correctly.

Notice:

Sometimes, in order to adapt Chinese characters to some special requirements (for example, httpheader requires that its content must be encoded in iso8859-1), it may be possible to encode Chinese characters in bytes, such as:
String s_iso88591 = newString("In ".getBytes("UTF-8"),"ISO8859-1"), the s_iso8859-1 string obtained in this way is actually three characters in ISO8859-1, after passing these characters to the destination, the destination The program uses the opposite method Strings_utf8 = newString(s_iso88591.getBytes("ISO8859-1"),"UTF-8") to get the correct Chinese character "中", which ensures compliance with the protocol and supports Chinese.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324882408&siteId=291194637