response.setContentType("text/html;charset=UTF-8"); request.setCharacterEncoding("UTF-8"); // response.setCharacterEncoding("UTF-8"); OutputStream os = null; InputStream is = null; try { StringBuilder sb = new StringBuilder(""); is = request.getInputStream(); byte[] buffer = new byte[1024]; while (is.read(buffer) != -1) { sb.append(new String(buffer)); } String xml = sb.toString().trim(); } catch (Exception e) { e.printStackTrace (); }
The above code reads a large amount of data, and the data contains Chinese characters, so there will be some garbled Chinese characters.
Analyze the reasons:
- UTF-8 is a variable-length character encoding for Unicode, also known as Universal Code. It can be used on web pages to display Chinese, Simplified, Traditional and other languages (such as English, Japanese, Korean) on a unified page.
-
Common Chinese characters use utf-8 encoding to occupy 3 bytes (about 20,000 words), but most Chinese characters in the super large character set occupy 4 bytes (in the unicode encoding system, U+20000 starts with 50,000 characters) multiple Chinese characters). English characters and numbers occupy 1 byte.
- The buffer in the program is 1024. In this case, some Chinese characters will only be read in part, and part of the Chinese characters will be garbled when re-encoded.
solution:
response.setContentType("text/html;charset=UTF-8"); request.setCharacterEncoding("UTF-8"); // response.setCharacterEncoding("UTF-8"); OutputStream os = null; InputStream is = null; InputStreamReader isr = null; BufferedReader br = null; try { StringBuilder sb = new StringBuilder(""); is = request.getInputStream(); isr = new InputStreamReader(is,"UTF-8"); br = new BufferedReader(isr); String line = null; while ((line = br.readLine()) != null) { sb.append(line ); } String xml = sb.toString().trim(); } catch (Exception e) { e.printStackTrace (); }