Java读取txt或其他文件以UTF-8格式输出的时候，第一行代码出现“？”乱码的原因及解决方案！

在去年写一个 joke 随即的时候，我写了一个爬虫在我的服务器，我用代码把获取的 joke 数据排成了一个有序的txt文本，这是我

需要远程读取这个txt文本数据，需要外部访问里面的 joke 数据，遇到读取的时候第一行代码首位字符出现“？”乱码的情况，刚开

始以为是 “\u000000” 导致的，print 发现是我想错了，后来想到通过打印byte来解决，才最终搞定，今天在写 150多家快递查询

的时候，我需要把我自己写好的服务器 txt 快递商家读取，并且封装成 xml 或者 json 供自己使用，又遇到这样的问题，所以觉得

有必要分享一下解决办法以及这个问题的原因所在。

通过java写的UTF-8文件，可以获取文件里面本身的数据，但是如果用记事本将相同的内容使用UTF-8格式保存，则在使用程序读

取是会从文件中多读出一个不可见字符，这个字符在 console 是以 “？”形式出现的，究其原因可能是 Java在处理UTF-8 bom

编码文件的时候有 bug，JDK 1.5.-1.7 都有出现，博主的是 1.7，我们可以这样来规避这样的问题，就是采用 byte 来重组我们的

bytes 最后重新组合成我们的内容

下面来看一个案例剖析：

原始为处理例子：

	public static void HttpRequest(String Request){
		StringBuffer buffer = new StringBuffer();
		try {
			URL url = new URL(Request);
			HttpURLConnection connection = (HttpURLConnection) url.openConnection();
			connection.setDoInput(true);
			connection.setDoOutput(true);
			connection.setUseCaches(false);
			connection.setRequestMethod("GET");
			InputStream input = connection.getInputStream();
			InputStreamReader read = new InputStreamReader(input,"UTF-8");
			BufferedReader reader = new BufferedReader(read);
			String line;
			while((line=reader.readLine())!=null){
				buffer.append(line);
			}
			String lines = buffer.toString();
			byte[] allbytes = lines.getBytes("UTF-8");  
		    for (int i=0; i < allbytes.length; i++)   
		    {   
		    	if(i==0){
		    		System.out.print("原始未打乱重排字节数组,长度"+allbytes.length+"：");
		    	}
		        int tmp = allbytes[i];   
		        String hexString = Integer.toHexString(tmp);   
		        // 将每一个 byte to 十六进制，去除前2位，即0x 取后面的2位表示 
		        hexString = hexString.substring(hexString.length() -2);
		        // 空格分离， 大写区分区别
		        System.out.print(" "+hexString.toUpperCase());
		    }
		    System.out.println("\n"+new String(allbytes,"utf-8")+"\n");
			reader.close();
			read.close();
			input.close();
			input = null;
			connection.disconnect();
		} catch (Exception e) {
			
		}
	}

处理后的例子：

	public static void HttpRequest2(String Request){
		JSONObject jsonObject = null;
		StringBuffer buffer = new StringBuffer();
		try {
			URL url = new URL(Request);
			HttpURLConnection connection = (HttpURLConnection) url.openConnection();
			connection.setDoInput(true);
			connection.setDoOutput(true);
			connection.setUseCaches(false);
			connection.setRequestMethod("GET");
			InputStream input = connection.getInputStream();
			InputStreamReader read = new InputStreamReader(input,"UTF-8");
			BufferedReader reader = new BufferedReader(read);
			String line;
			while((line=reader.readLine())!=null){
				buffer.append(line);
			}
			String lines = buffer.toString();
			byte[] allbytes = lines.getBytes("UTF-8");  
			byte[] new_allbytes = new byte[allbytes.length-3];
		    for (int i=0; i < allbytes.length; i++)   
		    {   
		        // 截掉乱码的字节，从第三个字节开始读取，并且重组它们
		        // 下标从第三个开始，所以i-3，意为着原始字节数组的索引3
		        // 也就是第四位作为重组后的第一个索引，以此类推
		        if(i>=3){
//		        	 将每一个 byte to 十六进制，去除前2位，即0x 取后面的2位表示
		        	if(i-3==0){
		        		System.out.print("\n经过打乱重排的字节数组,长度"+new_allbytes.length+"：");
		        	}
		        	new_allbytes[i-3] = allbytes[i];
		        	int temp = new_allbytes[i-3];
		        	String result = Integer.toHexString(temp);
		        	result = result.substring(result.length()-2);
			        // 空格分离,大写区分区别
			        System.out.print("  "+result.toUpperCase());
		        }
		    }
		    System.out.print("\n"+new String(new_allbytes,"utf-8"));   
			reader.close();
			read.close();
			input.close();
			input = null;
			connection.disconnect();
			jsonObject = JSONObject.fromObject(buffer.toString());
		} catch (Exception e) {
			
		}
	}

执行：

执行结果对比：

Engineer-Jsp 博客专家

发布了97 篇原创文章 · 获赞 285 · 访问量 36万+

他的留言板关注

Java读取txt或其他文件以UTF-8格式输出的时候，第一行代码出现“？”乱码的原因及解决方案！

猜你喜欢