The encoding problem of socket simulating http access to get the returned data

The simulated socket request code is as follows

 

public class HttpTest {
 public static void main(String[] args) throws Exception, IOException {
  Socket socket = new Socket("www.baidu.com",80);
  OutputStream os = socket.getOutputStream();
  StringBuffer sb = new StringBuffer();
  sb.append("GET https://www.baidu.com/ HTTP/1.1\r\n");
  sb.append("Host: www.baidu.com\r\n");
  sb.append("\r\n");
  byte[] bytes = sb.toString().getBytes();
  os.write(bytes);
  InputStream iStream  = socket.getInputStream();
  ByteArrayOutputStream baos = new ByteArrayOutputStream();
  byte[] bss = new byte[1024];
  int len = -1;
  while((len = iStream.read(bss)) != -1){
   baos.write(bss,0,len);
  }
  System.out.println(new String(baos.toByteArray(), "gbk"));
  socket.close();
 }
}

 

The question here is why do you use "gbk" encoding when you want to new a string. This is because although the default encoding of my eclipse is utf-8, the encoding character set of my system is gbk, and the data obtained from the socket connection is accepted by ByteArrayOutputStream. No matter what encoding is used on the opposite side to send data, the system socket will convert the data into gbk-encoded byte stream. The ByteArrayOutputStream accepts these byte streams unmodified. Therefore, the "gbk" encoding is used when the new string is used. But the eclipse editor is set to utf-8, will there be no error when it is displayed? In fact, the new string can be seen as what encoding method is used to interpret these strings. For example, two bytes with a value of 46018 are interpreted as "Chen" by gbk. What about the virtual machine? Using utf-8 encoding, it will be automatically translated into 9648 for storage and display

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326270037&siteId=291194637