python爬虫之爬取网站小说 - 代码天地

python爬虫之爬取网站小说

其他 2020-04-11 11:39:51 阅读次数: 0

继上次的使用类的方法爬取的单页单章小说后，在准备爬取整部小说，遇到点困难，先用函数式编程试试结果。
代码如下：

'''
函数式编程
17K小说网爬取龙井迷案小说
'''

# 导入第三方库
import requests
from lxml import etree
import time
from fake_useragent import UserAgent
# 定义随机的UserAgent
ua = UserAgent()
headers = {'User-Agent':ua.random}


# 得到html文本的函数
def get_html(url):
    time.sleep(1)
    html = requests.get(url, headers=headers).content.decode()
    return html


# 解析html文本的函数
def paser_html(html):
    novel = {}
    e = etree.HTML(html)
    href = e.xpath('//dl[@class="Volume"]/dd/a/@href')
    href = ["https://www.17k.com" + i for i in href]
    novel["href"] = href
    return novel


# 定义解析详情页的函数
def paser_detail(novel):
    text1 = []
    for url in novel["href"]:
        time.sleep(1)
        parg = requests.get(url, headers=headers).content.decode()
        e = etree.HTML(parg)
        text = e.xpath('//div[@class="p"]/p/text()')
        for text in text:
            text1.append(text)
    return text1


# 定义保存文本的函数
def save_page(text):
    for text in text:
        with open('不愿负你，孤独一生.txt','a') as f:
            f.write(text)


# 定义主函数
def main():
    url = "https://www.17k.com/list/3080392.html"
    html = get_html(url)
    novel = paser_html(html)
    text = paser_detail(novel)
    save_page(text)


# 运行函数
if __name__ == '__main__':
    main()

爬取结果：
在这里插入图片描述

在这里插入图片描述

warm...

发布了44 篇原创文章 · 获赞 16 · 访问量 2385

私信关注

猜你喜欢

转载自blog.csdn.net/qq_46292926/article/details/104818107

python爬虫之爬取网站小说

Python爬虫爬取网站小说

python爬虫爬取网站小说

python爬虫之爬取网站小说，获取一部小说

(二）Python爬虫笔记--爬取网站小说

Python爬虫基础入门实战案例（爬取网站小说）

python爬虫，简单的爬取小说网站的阅读排名

Python实现某网站爬取小说（爬虫）

Python爬虫——爬取小说

爬虫爬取小说网站

爬虫小案例——爬取网站小说

python爬虫入门之爬取小说.md

Python爬虫系列之小说网爬取

Python爬虫实战项目之小说信息爬取

python爬虫实例之——多线程爬取小说

python爬虫实例之小说爬取器

python爬虫之爬取网站图片

python之爬取小说

Python爬虫—爬取小说名著

python：爬虫练习爬取小说(初学)

用Python爬取某网站小说

Python爬取小说网站

python 爬取小说网站实战

Python爬虫之Scrapy框架系列（14）——实战ZH小说爬取【多页爬取】

学习python3爬虫爬取静态小说网站

python爬虫爬取笔趣网小说网站过程图解

python爬虫爬取小说网站并转换为语音文件

scrapy爬虫-爬取wattpad外网小说网站

python 爬取小说

Python爬取小说

今日推荐

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

国产云输入法——仅华为无云端数据上传安全问题

开源日报 | 工业开源项目OGG 1.0；姐姐，你要和我一起配置火狐吗；苹果AI遥遥落后？Fedora 40

开放签电子签章：停止新增，优化体验，前进更进（五一假期前工作）

开源日报 | 中学生开源前端动画引擎；全球首个Llama3 8B中文版开源模型；联想电脑恐出局；Linus讽刺AI炒作

周排行

浏览器对同一域名进行请求的最大并发连接数

React Hook之自定义Hook

【转】MyBatis缓存机制

-Java-泛型

自动化测试常用脚本-发送邮件

LeetCode#859: Buddy Strings

java、Python处理字符串

第二篇の博客

Hadoop伪分布式环境安装

SQL Server进阶（十一）临时表、表变量

每日归档

更多

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)

2024-04-21(0)

2024-04-20(6)

2024-04-19(5)

2024-04-18(0)