常见的爬虫乱码的解决办法

法一:设置response的encoding

import requests
res = requests.get("https://www.baidu.com/")
res.encoding = res.apparent_encoding
print(res.text)

法二:设置response的content解码

import requests
res = requests.get("https://www.baidu.com/")
txt = res.content.decode('gbk')

法三:使用chardet

import requests
import chardet
res = requests.get("https://www.baidu.com/")
encoding = chardet.detect(res.content)['encoding']
print(res.content.decode(encoding))

法四:使用cchardet

import requests
import cchardet
res = requests.get("https://www.baidu.com/")
encoding = cchardet.detect(res.content)['encoding']
print(res.content.decode(encoding))

法五:encode + decode

import requests
import cchardet
res = requests.get("https://www.baidu.com/")
res_encoding = res.encoding  # 响应的编码方式
con_encoding = cchardet.detect(res.content)['encoding']  # 内容的编码方式
print(res.text.encode(res_encoding).decode(con_encoding))  # 重新编解码text

猜你喜欢

转载自blog.csdn.net/m0_46521785/article/details/127116209