response.setContentType("text/html;charset=UTF-8"); request.setCharacterEncoding("UTF-8"); // response.setCharacterEncoding("UTF-8"); OutputStream os = null; InputStream is = null; try { StringBuilder sb = new StringBuilder(""); is = request.getInputStream(); byte[] buffer = new byte[1024]; while (is.read(buffer) != -1) { sb.append(new String(buffer)); } String xml = sb.toString().trim(); } catch (Exception e) { e.printStackTrace(); }
上述代码读取的数据量大,且数据中包含中文字符,会出现个别中文字符乱码问题
分析原因:
- UTF-8是一种针对Unicode的可变长度字符编码,又称万国码。用在网页上可以统一页面显示中文简体繁体及其它语言(如英文,日文,韩文)。
-
常用中文字符用utf-8编码占用3个字节(大约2万多字),但超大字符集中的更大多数汉字要占4个字节(在unicode编码体系中,U+20000开始有5万多汉字)。英文字符和数字占用1个字节。
- 程序中buffer为1024,在这种情况下,就会出现部分汉字只读取部分的情况,汉字的一部分进行重新编码当然会出现乱码的问题
解决方案:
response.setContentType("text/html;charset=UTF-8"); request.setCharacterEncoding("UTF-8"); // response.setCharacterEncoding("UTF-8"); OutputStream os = null; InputStream is = null; InputStreamReader isr = null; BufferedReader br = null; try { StringBuilder sb = new StringBuilder(""); is = request.getInputStream(); isr = new InputStreamReader(is,"UTF-8"); br = new BufferedReader(isr); String line = null; while ((line = br.readLine()) != null) { sb.append(line ); } String xml = sb.toString().trim(); } catch (Exception e) { e.printStackTrace(); }