A character accounted for a few bytes remember it is not very good, is not the same coding bit accounted byte is not the same. Below with a brief look at a character code that accounted for a few bytes.
Examples
String s = "情系IT";
try {
byte[] bytes1 = s.getBytes("gbk");
for (byte b : bytes1) {
System.out.print(Integer.toHexString(b & 0xff)+" ");
}
System.out.println();
byte[] bytes2 = s.getBytes("utf-8");
for (byte b : bytes2) {
System.out.print(Integer.toHexString(b & 0xff)+" ");
}
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Here are the results:
Resolve
- Integer.toHexString (int a), this is a method java API provided by the object to return a string representation of integer parameter, a 16-bit unsigned integer in.
- Why use b & 0xff?
- Integer.toHexString (int a), it is a required parameter of type int.
- 11111111 0xff is the hexadecimal representation.
- We know that byte is a byte, int is four bytes, that is, to eight to 32-bit. If there is no sign bit, we can directly fill 0. & 0xff therefore is to ensure that the sign bit.
- For example: -127 into binary 11111111, 10000001, compared inverted complement, transferrin was hexadecimal 81, to continue to decimal format as unsigned number into a 129 will find.
- Integer.toHexString (int a), it is a required parameter of type int.
to sum up
According to the results we can see,
the string is utf-8 encoding, a three-byte characters, a byte letter a.
Gbk string is encoded, a two-byte characters, a byte letter a.
If you want to see in the case of other codes, the above code can be copied, the strings can view other coding.
Note: If the string is not encoded, the default encoding of the project.
If the article helpful, please remember to focus on points like yo ~
Welcome to my public concern number: The feelings of IT, technical articles for daily push them to learn.