python爬虫编码问题详解 (requests) - 代码天地

python爬虫编码问题详解 (requests)

其他 2020-04-18 08:40:29 阅读次数: 0

具体参见：https://blog.csdn.net/Likianta/article/details/101293915

import requests

def get_text(resp):
    # 优先使用 chardet 预测的 encoding, 其次使用 http header 提供的 encoding
    source_encoding = resp.apparent_encoding or resp.encoding
    if source_encoding is None:
        # 说明是二进制文件, 比如 pdf, jpg 之类的
        raise Exception
    elif source_encoding == 'GB2312':
        source_encoding = 'GBK'
    return resp.content.decode(source_encoding, errors="ignore")

# 测试 "问题" 网页
url = 'http://www.most.gov.cn/ztzl/gjkxjsjldh/jldh2002/zrj/zrjml.htm'
response = requests.get(url)
text = get_text(response)
# | text = response.text  # 不用这个了

# 保存为文件
with open('result.html', 'w', encoding='utf-8') as f:
    f.write(text)

柴神博客专家

发布了150 篇原创文章 · 获赞 149 · 访问量 81万+

他的留言板关注

猜你喜欢

转载自blog.csdn.net/chaishen10000/article/details/103164372

python爬虫编码问题详解 (requests)

Python requests 编码问题

PYTHON爬虫（requests库详解）

python爬虫库详解-requests

python爬虫的requests库详解

python requests 中文编码问题

爬虫（Requests）

requests 爬虫

requests爬虫

爬虫_requests

爬虫 - requests

爬虫之requests详解

Requests 详解

Requests详解

Python爬虫之-Requests

Python爬虫-Requests库

Python爬虫-requests

「PYTHON」-- 爬虫 requests

python爬虫--------requests

python爬虫requests模块

python 爬虫之 requests

Python爬虫 --requests库

Python爬虫------requests库

python requests爬虫

Python爬虫_Requests

Python之爬虫-- Requests

Python爬虫——Requests库

python爬虫-requests模块

python爬虫(九)-------------------requests

Python爬虫(二) | requests

今日推荐

国产云输入法——仅华为无云端数据上传安全问题

开源日报 | 工业开源项目OGG 1.0；姐姐，你要和我一起配置火狐吗；苹果AI遥遥落后？Fedora 40

开放签电子签章：停止新增，优化体验，前进更进（五一假期前工作）

开源日报 | 中学生开源前端动画引擎；全球首个Llama3 8B中文版开源模型；联想电脑恐出局；Linus讽刺AI炒作

“百模大战”必有一战 | 2024中国“百模大战”竞争格局分析

最强开源大模型 Llama 3 上架 Gitee AI

虽然老乡鸡开源的不是代码，但背后的原因却让人很暖心

周排行

决策树的部分理解

STM32软件IIC的实现

RocketMQ原理解析-HA

vue-动态路由（路由的传参和接参）

利用python对Excel中的特定数据提取并写入新表

【Ubuntu】 Ubuntu16.04搭建NFS服务

Elasticsearch基础操作与对应的curl命令行，python对接实现

JVM数据存储结构 & Java的值传递和址传递

yum命令使用指南

java基础（一）：java语法基础

每日归档

更多

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)

2024-04-21(0)

2024-04-20(6)

2024-04-19(5)

2024-04-18(0)

2024-04-17(5)

2024-04-16(70)

2024-04-15(42)