字符集问题,用URLConnection来开启http访问获取数据的问题,数据是GBK编码,获取下来成了乱码。
原方法:
(注释掉的是将数据进行文件存储)
public void download(String URLString) { FileOutputStream out = null; InputStream in = null; try{ URL url = new URL(URLString); HttpURLConnection httpURLConnection = (HttpURLConnection) url.openConnection(); // true--will allow read in from httpURLConnection.setDoInput(true); // default is GET httpURLConnection.setRequestMethod("GET"); // 1 min httpURLConnection.setConnectTimeout(60000); // 1 min // httpURLConnection.setReadTimeout(60000); // connect to server (tcp) httpURLConnection.connect(); in = httpURLConnection.getInputStream();// send request to // // server // File file = new File(localFileName); //you can save into your localFile // if(!file.exists()){ // file.createNewFile(); // } // // out = new FileOutputStream(file); // byte[] buffer = new byte[4096]; // int readLength = 0; // while ((readLength=in.read(buffer)) > 0) { // byte[] bytes = new byte[readLength]; // System.out.println(readLength); // System.arraycopy(buffer, 0, bytes, 0, readLength); // out.write(bytes); // } // // out.flush(); BufferedReader reader = new BufferedReader(new InputStreamReader(in)); StringBuilder builder = new StringBuilder(); String line = null; while ((line = reader.readLine()) != null) { String[] ele = line.split(","); for (String e:ele) { e = new String(e.getBytes("GBK"),"GBK"); System.out.println(e); } builder.append(line); builder.append("\n"); //appende a new line } System.out.println(new String(builder.toString().getBytes(),"GBK")); }catch(Exception e){ e.printStackTrace(); }finally{ try { if(in != null){ in.close(); } } catch (IOException e) { e.printStackTrace(); } // try { // if(out != null){ // out.close(); // } // } catch (IOException e) { // e.printStackTrace(); //} } }
对方传入数据为:
王镔,1,北京市海淀区西三旗建材城西二里
上面的代码解析数据为:
???? 1 ?????к???????????????????? 锟斤拷锟斤拷,1,锟斤拷锟斤拷锟叫猴拷锟斤拷锟斤拷锟斤拷锟斤拷锟届建锟侥筹拷锟斤拷锟斤拷锟斤拷
所以可以断定,编码确实是GBK的,网上找资料看了一下Java 正确的做字符串编码转换
基本上都是说用new String (bytes, Charset) 中的charset 是指定读取 bytes 的方式。但是显然转换的结果不准确。
后来参考了于java字符集编码问题 URLConnection
当字节输入流InputStream转为字符InputStreamReader时候,加上原来的字符集编码就可以了。
因为在java内存里面保存的都是统一编码,InputStreamReader会自动转换为统一编码,导致后面在用编码转换出现问题。
所以在代码中,改为:
BufferedReader reader = new BufferedReader(new InputStreamReader(in,"GBK"));
将getBytes()删掉就好啦。