Java Web (2) Solving the problem of garbled response and request in Servlet

1. The garbled problem of request request parameters  

    get request:

        The parameters of the get request are submitted after the url, that is, in the request line,

          

          

        MyServlet is an ordinary servlet. When the browser accesses it, it submits a parameter value of name=Xiaoming using the get request method, obtains the parameter value in doGet, and prints it to the console, and finds garbled characters

        Reasons for garbled characters:

              Prerequisite knowledge: The meanings of the three nouns: decoding table, encoding, and decoding are needed. I'll just say the usual,

                  Code table: It is a rule used to convert the language we understand into the language that the computer can understand. There are many medium code tables, IS0-8859-1, GBK, UTF-8, UTF-16 A series of code tables, such as GBK, UTF-8, UTF-16, can all identify a Chinese character, and if you want to identify English, you can use other code tables such as IS0-8859-1.

                  Coding: Converting a language we understand into a language that a computer can understand. This process is the role of encoding

                  Decoding: Converting the language understood by the computer into a language we can read and understand. This process is the role of decoding

                    Please refer to this blog post for details.

                  This can only represent an example of encoding once. In some programs, a Chinese character or a letter will be encoded several times in a row with different code tables, so the first encoding is still the same as the above-mentioned function, and the second encoding is to encode The language that the computer can understand is converted into the language that the computer can understand (the conversion rules are different), then the decoding process must go through two decodings, that is, the inverse process of encoding. The following example is a good illustration of this problem.

              The browser uses the UTF-8 code table, which is transmitted through the http protocol. The http protocol only supports IS0-8859-1. When it arrives at the server, the IS0-8859-1 code table is also used by default. See the picture

              

              That is, three processes, which have undergone two encodings, so two decodings are required.

              1. The browser encodes "Xiao Ming" using the UTF-8 code table (because Xiao Ming is a Chinese character, so use a code table that can identify Chinese, which is also what we can manually set on the browser. If it is used, it cannot be identified. Chinese code table, then there will be garbled characters, because the computer symbol corresponding to Chinese cannot be found in the code table, it may be represented by other symbols such as ??), the code obtained after encoding is 1234, and it is transmitted through the http protocol .

              2. In the HTTP protocol transmission, only the symbols represented in the ISO-8859-1 code table can be used, so our original 1234 will be encoded again, this time using ISO-8859-1, what is obtained? ??? , then transmitted to the server

              3. The data obtained by the server is the data obtained after two encodings, so it must be decoded in the reverse direction of the original encoding process, first UTF-8 encoding, and then ISO-8859-1 encoding, then the decoding process, just It must be decoded with ISO-8859-1 first, and then decoded with UTF-8, so that the correct data can be obtained. ????.getBytes("ISO-8859-1");//The first decoding, converted to a language that the computer can recognize, new String(1234,"UTF-8");//The second decoding, Convert to a language we know

              solution code

                

                

                

 

    Post request:

          The parameters of the post request method are in the request body, which is much simpler than the get request and does not go through the encoding process of the http protocol, so it is only necessary to set the code table decoded by the server on the server side to be the same as the code table encoded by the browser. That's it, here the browser uses UTF-8 code table encoding, then the server side sets the code table used for decoding to UTF-8 and it is OK

          Set the server to use UTF-8 code table decoding

              request.setCharacterEncoding("UTF-8"); //Command Tomcat to decode using the UTF-8 code table instead of the default ISO-8859-1.

          Therefore, in many cases, the first sentence of the doPost method is this code to prevent garbled characters when obtaining request parameters.

 

     Summarize the problem of garbled request parameters

          The Chinese garbled problem of get request and post request mode is handled differently

            get: The request parameter is in the request line, involving the http protocol, manually solve the problem of garbled characters, know the root cause of the garbled characters, and prescribe the right medicine. The principle is to perform two encodings and two decoding processes.

              new String(xxx.getBytes("ISO-8859-1"),"UTF-8");

            post: The request parameters are in the request body, and the servlet API is used to solve the problem of garbled characters. The principle is to encode and decode once, and command tomcat to use a specific code table to decode.

              request.setCharaterEncoding("UTF-8");

            

2. The response responds to the Chinese garbled characters that appear in the browser.          

      First, let's talk about how the response object sends data to the browser. Two methods, one getOutputStream, one getWrite.

        ServletOutputStream getOutputStream(); //Get the output byte stream. Provides two output methods, write() and print()

        PrintWriter getWrite(); //Get the output character stream and provide two output methods, write() and print()

          The bottom layer of the print() method uses the write() method, which is equivalent to the print() method, which encapsulates the write() method, making it easier and faster for developers to use it. ) method without considering how to convert the bytes.

      1、ServeltOutputStream getOutputStream();

          Chinese cannot be output directly, and an exception will be reported when outputting Chinese directly.

                

           exception source code

            

          solve:

            resp.getoutputStream().write("Hahaha, I want to output to the browser".getBytes("UTF-8"));

            The Chinese characters to be output are first encoded in UTF-8, instead of tomcat, so that if the browser uses the UTF-8 code table for decoding, it will be output correctly. If the browser is not using UTF-8, Then there will still be garbled characters, so the key depends on the code table used by the browser, which is not very good, and here is a point to note, that is, the write(byte) method is used, because the print() method does not output the byte type Methods.

 

      2、PrintWriter getWrite();

          Direct output of Chinese will not report an exception, but it will definitely report an exception, because the ISO-8859-1 code table cannot identify Chinese, which is wrong at the beginning, how to decode the code and read it is useless

          There are three ways to make it output Chinese correctly

          1、使用Servlet API  response.setCharacterEncoding()

              response.setCharacterEncoding("UTF-8"); //Let tomcat encode the Chinese we want to respond to the browser in UTF-8 instead of using the default ISO-8859-1, this still depends on the browser Is the UTF-8 code table used, as defective as the above

            

          2. Notify tomcat and browser that both use the same code table

              response.setHeader("content-type","text/html;charset=uft-8"); //Manually set the response content, notify tomcat and browser to use utf-8 for encoding and decoding.

                  charset=uft-8 is equivalent to response.setCharacterEncoding("UTF-8");//Notify tomcat to use utf-8 for encoding

                  response.setHeader("content-type","text/html;charset=uft-8");//Together, it not only informs tomcat to use utf-8 encoding, but also informs the browser to use UTF-8 to decode.

              response.setContentType("text/html;charset=uft-8"); //Use the Servlet API to notify tomcaat and force the browser to use UTF-8 for encoding and decoding. The underlying code of this is the code in the previous line. Just a simple package.                          

              

 

          3. Notify tomcat, when using html <meta> to notify the browser (html source code), note: <meta> suggests that the browser should use encoding, and cannot be forced

              two steps

                  

 

          Therefore, when the response is responding, as long as the tomcat and the browser are notified to use the same code table, the second method is generally used, then the garbled problem of the response can be solved.

 

 

3. Summary

      It always seems very cumbersome when I explain it above. In fact, I know the principle. It is very simple. Now let’s summarize it.

      request garbled

          get request:

              It has been encoded twice, so it has to be decoded twice

              First decoding: xxx.getBytes("ISO-8859-1"); get yyy

              Second decoding: new String(yyy,"utf-8");

              连续写:new String(xxx.getBytes("ISO-8859-1"),"UTF-8");

          post request:

              After only one encoding, so only one decoding, use Servlet API request.setCharacterEncoding();

              request.setCharacterEncoding("UTF-8"); //Not necessarily resolved, depending on what code table the browser uses to encode, the browser uses UTF-8, then write UTF-8 here.

       response garbled

          getOutputStream();

              Using this byte output stream, you cannot directly output Chinese, and an exception will occur. If you want to output Chinese, the solution is as follows

              Solution: getOutputStream().write(xxx.getBytes("UTF-8")); // Manually encode Chinese with UTF-8 code table and convert it into bytes for transmission. After it becomes bytes, no exception will be reported , and tomcat will not be encoding, because it has been encoded, so after the browser, if the browser uses UTF-8 code table decoding, there will be no Chinese garbled characters, otherwise, Chinese garbled characters will appear, so this method, can not completely guarantee that Chinese is not garbled

          getWrite ();

              Using the character output stream, Chinese can be output directly, no exception will occur, but garbled characters will appear. Can be solved in three ways, always use the second method

              Solution: notify tomcat and browser to use the same code table.

                response.setContentType("text/html;charset=utf-8"); //Notify the browser to use UTF-8 decoding 

                  Tell tomcat and browser to use UTF-8 encoding and decoding. The underlying principle of this method is this sentence: response.setHeader("contentType","text/html;charset=utf-8"); 

 

          Note: The two methods getOutputStream() and getWrite() cannot be used at the same time, only one can be used at a time, otherwise an exception will be reported

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325816625&siteId=291194637