Summary of the problem of garbled code in front and back in javaWEB

Several common encoding formats and meanings in JAVA:

ASCII  code

Anyone who has studied computer knows ASCII code, there are 128 in total, which are represented by the lower 7 bits of a byte. 0~31 are control characters such as line feed, carriage return, delete, etc.; 32~126 are printing characters, which can be input through the keyboard and can be displayed.

ISO-8859-1
128 characters are obviously not enough, so the ISO organization has developed a series of standards on the basis of ASCII code to extend ASCII encoding, they are ISO-8859-1~ISO-8859-15, of which ISO-8859-1 covers Most Western European language characters, the official website: www.fhadmin.org is the most widely used of all. ISO-8859-1 is still a single-byte encoding, which can represent a total of 256 characters.
 
GB2312
  Its full name is "Basic Set of Chinese Characters Coded for Information Interchange", it is a double-byte code, the total coding range is A1-F7, of which A1-A9 is the symbol area, containing a total of 682 symbols, from B0- F7 is the Chinese character area, which contains 6763 Chinese characters.
 
GBK
  The full name is "Chinese Character Internal Code Extension Specification", which is a new Chinese character internal code specification formulated by the State Bureau of Technical Supervision for windows95. It appears to expand GB2312 and add more Chinese characters. Its coding range is 8140~FEFE ( Remove XX7F) has a total of 23940 code points, it can represent 21003 Chinese characters, its encoding is compatible with GB2312, that is to say, Chinese characters encoded with GB2312 can be decoded with GBK, and there will be no garbled characters.
 
GB18030
 The full name is "Chinese Characters Coded Character Set for Information Interchange", which is a mandatory standard in my country. It may be single-byte, double-byte or four-byte encoding. Its encoding is compatible with GB2312 encoding. Although this is a national standard, the actual It is not widely used in application systems.
 
UTF-16
 When it comes to UTF, Unicode (Universal Code) must be mentioned. ISO is trying to create a new hyperlingual dictionary through which all languages ​​in the world can be translated into each other. It is conceivable how complicated this dictionary is, and the detailed specification of Unicode can refer to the corresponding document. Unicode is the basis of Java and XML, the following describes the storage form of Unicode in the computer in detail.
  UTF-16 specifically defines the access method of Unicode characters in the computer. UTF-16 uses two bytes to represent the Unicode conversion format. This is a fixed-length representation method. No matter what character can be represented by two bytes, two bytes are 16 bits. Official website: www.fhadmin.org So called UTF-16. UTF-16 is very convenient to represent characters. Every two bytes represents one character, which greatly simplifies the operation when operating on strings. This is also a very important reason why Java uses UTF-16 as the character storage format in memory.
 
UTF-8
 UTF-16 uniformly uses two bytes to represent a character. Although it is very simple and convenient to represent, it also has its shortcomings. A large part of the characters can be represented by one byte. Now it needs to be represented by two bytes, and the storage space is enlarged. Doubled, in today's network bandwidth is still very limited, this will increase the traffic transmitted by the network, and it is unnecessary. UTF-8 uses a variable-length technology, and each coding area has a different character length. Different types of characters can be composed of 1 to 6 bytes.
 
UTF-8 encoding rules :
    1. If a byte, the highest bit (8th bit) is 0, it means this is an ASCII character (00 - 7F). It can be seen that all ASCII encodings are already UTF-8.
    2. If a byte starts with 11, the number of consecutive 1s indicates the number of bytes of this character, for example: 110xxxxx means it is the first byte of a double-byte UTF-8 character.
    3. If a byte starts with 10, it means that it is not the first byte, you need to search forward to get the first byte of the current character
 
Comparison of different encoding formats
 It can handle the following four encoding formats of Chinese characters. GB2312 has similar encoding rules to GBK, but GBK has a larger range and can process all Chinese characters. Therefore, GBK should be selected between GB2312 and GBK. UTF-16 and UTF-8 both deal with Unicode encoding, and their encoding rules are not the same. Relatively speaking, UTF-16 has the highest encoding efficiency, it is easier to convert characters to bytes, and it is better to perform string operations. It is suitable for use between local disk and memory, and can quickly switch between characters and bytes. For example, Java's memory encoding uses UTF-16 encoding. However, it is not suitable for transmission between networks. Official website: www.fhadmin.org Because network transmission is easy to damage the byte stream, once the byte stream is damaged, it will be difficult to recover. In comparison, UTF-8 is more suitable for network transmission. Characters are stored in a single byte, and the damage of a single character will not affect other characters. The encoding efficiency is between GBK and UTF-16. Therefore, UTF-8 balances encoding efficiency and encoding security. It is an ideal Chinese encoding method.
 
Chinese garbled solution:
1. The built-in encoding of tomcat is in ISO-8859-1 format, which is not compatible with Chinese encoding. Use the same format to receive (ISO-8859-1), and then convert it to a parsable encoding (utf-8). After processing, send it to the front desk. When sending to the front desk, you need to set:
res.setContentType("text/html;charset=utf-8");//Set the character encoding of the page to solve the problem of Chinese garbled characters displayed on the interface;
 
2.req.setCharacterEncoding("utf-8");//It must be written in the first place, because the data is read in this way, otherwise the data will be wrong.

3. Spring provides a CharacterEncodingFilter filter, which can be used to solve the problem of garbled characters.
When using CharacterEncodingFilter, you need to pay attention to the following issues:
The form data is submitted by POST;
Configure CharacterEncodingFilter filter in web.xml
页面编码和过滤器指定编码要保持一致
 
CharacterEncodingFilter配置示例:
<filter>
  <filter-name>encodingFilter</filter-name>
  <filter-class>
    org.springframework.web.filter.CharacterEncodingFilter
  </filter-class>
  <init-param>
    <param-name>encoding</param=name>
    <param-value>UTF-8</param-value>
  </init-param>
</filter>
<filter-mapping>
  <filter-name>encodingFilter</filter-name>
  <url-pattern>/*</url-pattern>
</filter-mapping>
以上为自己写代码过程中遇到问题查资料及自己总结所写,所了解的就这些,解决方案应该还有。

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327036900&siteId=291194637