Python crawler entry 6: http message body compression transmission that simulates browser access to web pages

☞ ░Go to LaoYuanPython blog https://blog.csdn.net/LaoYuanPython

I. Introduction

In the previous chapter, the method of accessing web pages using the request module of the urllib package was introduced. However, the previous section specifically stated that it is best not to set Accept-Encoding in the http header, otherwise the server will compress the http message body according to the field and the situation of the server. If the crawler application does not support decompression, the application will not be able to recognize the received Response message body. This section briefly introduces how to deal with the compression of response message body.

When the crawler crawls the webpage, if the message "'Accept-Encoding':'gzip'" is passed in the request header, the server will compress the message with gzip, and the client must support decompression of the message to recognize the message . To decompress gzip compression, you need to install the gzip module, and determine whether the server has compressed the message when the server returns the http response message, if it is compressed, it will decompress it, otherwise it will be read directly.

2. The crawler processing steps that support compression for the body of the HTTP response message

To compress the response HTTP message body, the crawler application needs to perform the following processing:

  1. Set the supported compression format in Accept-Encoding in the HTTP header of the request message
  2. After reading the response message, determine the compression format of the content-encoding return value in the response message header
  3. Call the corresponding decompression method to decompress the message body.

Three, case

  1. Import related modules:

Guess you like

Origin blog.csdn.net/LaoYuanPython/article/details/113068701