Description of the problem: When using python's requests.request() to get webpage data, the Chinese display garbled characters?
example:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
# @Time : 2021/9/7 9:35
# @Author : Sun
# @Email : [email protected]
# @File : sun_test.py
# @Software: PyCharm
import requests
from bs4 import BeautifulSoup
url = "https://www.****.cc/*******/"
header = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36 "
}
response = requests.request(method="Get", url=url, headers=header)
result = response.text
print(result)
# 查看headers中Content-Type
print(response.headers)
Result:
The reason is: the encoding format is not specified in the request, and the ISO-8859-1 encoding format is used for encoding by default. Latin1 is an alias of ISO-8859-1, and it is written as Latin-1 in some circumstances. Therefore, if you want to get Chinese, you must first encode and then decode
response = requests.request(method="Get", url=url, headers=header)
result2 = response.text.encode("latin1").decode("utf-8")
print(result2)
result: