The python web crawler reports the error "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position" solution

Python3.x crawler,

I found the error "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte", and I have been looking for the error of the file. Finally, after the user's prompt, the cause of the error turned out to be one of my headers:

 

“'Accept-Encoding': 'gzip, deflate'”

 

This one is copied directly from Fiddler. Why can I browse normally with a browser, but can't imitate it with Python?

 

Comprehensive online explanation:

 

This sentence means that the data in the compressed format is received locally, and the server sends the file in the compressed format of gzip, and the decompression of this gzip file can only be done using the deflate algorithm. The browser can automatically decompress it, but the program cannot automatically decompress the gzip. Set it, please refer to https://www.crifan.com/set_accept_encoding_header_to_gzip_deflate_return_messy_code/ for the setting method

 

Summary: When writing a crawler program, don't write 'Accept-Encoding': 'gzip, deflate', just let the server pass the original file, no need to compress it.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324662417&siteId=291194637