IO流读取数据个别几个中文字符乱码问题

        response.setContentType("text/html;charset=UTF-8"); 
		request.setCharacterEncoding("UTF-8");
//		response.setCharacterEncoding("UTF-8");
		OutputStream os = null;
		InputStream is = null;
		try {
			StringBuilder sb = new StringBuilder("");
			is = request.getInputStream();
			byte[] buffer = new byte[1024];
			while (is.read(buffer) != -1) {
				sb.append(new String(buffer));
			}
			String xml = sb.toString().trim();
                } catch (Exception e) {
			e.printStackTrace();
		}

 上述代码读取的数据量大,且数据中包含中文字符,会出现个别中文字符乱码问题

 

分析原因:
  1. UTF-8是一种针对Unicode的可变长度字符编码,又称万国码。用在网页上可以统一页面显示中文简体繁体及其它语言(如英文,日文,韩文)。
  2. 常用中文字符用utf-8编码占用3个字节(大约2万多字),但超大字符集中的更大多数汉字要占4个字节(在unicode编码体系中,U+20000开始有5万多汉字)。英文字符和数字占用1个字节。
  3. 程序中buffer为1024,在这种情况下,就会出现部分汉字只读取部分的情况,汉字的一部分进行重新编码当然会出现乱码的问题
解决方案:
                response.setContentType("text/html;charset=UTF-8"); 
                request.setCharacterEncoding("UTF-8");
//              response.setCharacterEncoding("UTF-8");
                OutputStream os = null;
                InputStream is = null;
                InputStreamReader isr = null;
                BufferedReader br = null;
                try {
                        StringBuilder sb = new StringBuilder("");
                        is = request.getInputStream();
                        isr = new InputStreamReader(is,"UTF-8");
                        br = new BufferedReader(isr);
                        String line = null;
                        while ((line = br.readLine()) != null) {
                             sb.append(line );
                        }
                        String xml = sb.toString().trim();
                } catch (Exception e) {
                    e.printStackTrace();
                }
 

猜你喜欢

转载自ysbwsx2017.iteye.com/blog/2397414
今日推荐