Java in the coding problem

Often encounter a problem will be garbled in the development process, not a big problem, but annoying people, today we will develop lessons learned recorded hope can give us some help.

Some concepts:
Character: sign, a symbol on an abstract sense that people use. For example: '1', 'in', 'A'

Byte: storing data in a computer unit, an 8-bit binary number, it is a very specific storage space

Character Set: which character to use. That is what Chinese characters, letters and symbols are the income standard. Included "character" of the collection is called "character set."

Code: specifies that each "character" are one byte or more bytes of memory, with which bytes of storage, this provision is called "coding"

Usually we call "character set", such as: GB2312, GBK, JIS, etc., in addition to "a collection of characters," meaning that the outer layer, also contains a "coding" means.

Various coding:
ASCII code
  because the computer recognize only numbers, so we are to be represented in all digital data inside the computer, because of limited English characters, the most significant byte of the requirement to use is 0, each byte is to number between 0-127 to represent. A corresponds example 65, a 97 corresponds. This is the American Standard Code for Information Interchange, ASCII code

GB2312 codes
  With the popularity of computers in the world, many countries and regions see themselves in the characters introduced computer, such as Chinese characters. At this time, we found that a range of numbers of bytes can be represented too small to contain all the Chinese characters. Then the requirement to use two bytes to represent a character.

  States: the original ASCII character code remains unchanged, use a byte, in order to distinguish a Chinese character with two ASCII characters to distinguish. The highest bit of each byte Chinese characters provisions of 1 (ie, Chinese binary is negative), which is encoded GB2312

GBK
  because too many Chinese characters, GB2312 on the basis of adding more Chinese characters, this encoding is GBK

  Question: If only in China, so we all know Chinese characters, but if other countries, and the country code table is not included in Chinese characters. So when displayed on the computer as garbage or other characters.

  Solution: In order to address the impact of various countries because of the localized character encoding brought, put all of the characters in the world unified coding --- Unicode encoded, this one is a fixed character display anywhere in the world, such as kanji brother, everywhere are expressed in hexadecimal 54E5. Unicode character encoding occupies two bytes.

UTF-8
  is a variable length for Unicode character encoding, also known as Unicode, is one of the Unicode implementation. Encoding the first byte is still compatible with ASCII, which makes software handling the original ASCII characters or simply do not need to modify a small part, you can continue to use. Therefore, it has gradually become encoded e-mail, web pages and other stored or transmitted text applications, the use of priority. Internet Engineering Task Force (IETF) requires all Internet protocols are required to support UTF-8 encoding

Character encoding and decoding
information transmitted in a computer network in the form of bytes. So how do you become bytes? This is the encoding process. Then the computer receives this code, how to let users know about it? To convert bytes that must be recognized by a human string, which is the decoding process.

  Code: byte array to convert a string

  Decoding: The byte array into a string

Note: The encoding and decoding formats must be the same format, otherwise garbled

The output is garbled following code:

String str=new String("Aa帅哥");
//编码操作
byte[] strByte=str.getBytes("GBK");
       
String str2=new String(strByte,"ISO-8859-1");
System.out.println(str2);

In order not garbled, the encoding and decoding formats must be consistent:

String str=new String("Aa帅哥");
//编码操作
byte[] strByte=str.getBytes("GBK");
 
String str3=new String(strByte,"GBK");
System.out.println(str3);

 

Guess you like

Origin www.linuxidc.com/Linux/2019-12/161837.htm