python 爬取古诗词网 - 代码天地

python 爬取古诗词网

其他 2020-08-11 14:16:25 阅读次数: 0

import threading
import requests
import re

def parse_page(url):
    headers = {
        'user - agent': 'Mozilla / 5.0(Windows NT 10.0;Win64;x64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 81.0.4044.129Safari / 537.36'
    }
    response = requests.get(url,headers)
    text = response.text
    titles = re.findall(r'<div\sclass="cont">.*?<b>(.*?)</b>',text,re.DOTALL)
    dynasties = re.findall(r'<p\sclass="source">.*?<a\s.*?>(.*?)</a>',text)
    authors = re.findall(r'<p\sclass="source">.*?<a\s.*?>.*?</a>.*?<a\s.*?>(.*?)</a>',text)
    contents = re.findall(r'<div class="contson" .*?>(.*?)</div>',text,re.DOTALL)
    contents_1 = list()
    for i in contents:
        x = re.sub(r'<.*?>',"",i)
        contents_1.append(x.strip())
    poems = []
    for value in zip(titles,dynasties,authors,contents_1):
        title,dynasty,author,content = value
        poem = {
            '名字':title,
            '朝代':dynasty,
            '作者':author,
            '诗句':content
        }
        poems.append(poem)
    for i in poems:
        print(i)
def splider():

    for i in range(1,11):
        thread = []
        url = 'https://www.gushiwen.org/default_%s.aspx'%i
        thread.append(threading.Thread(target=parse_page,args=(url,)))
        thread[-1].start()
    for i in thread:
        i.join()


if __name__ == '__main__':
    splider()
    print('+'*20)

猜你喜欢

转载自blog.csdn.net/weixin_45949073/article/details/106102323

python 爬取古诗词网

python第三方库re库实例之爬取古诗词网上诗歌

python第三方库xpath库实例之爬取古诗词网上诗歌

python3爬取古诗词

python第三方库bs4库实例之爬取古诗词网上诗歌

python爬虫爬取古诗词实例补充讲解之获取注释和译文

Python爬取古诗词写入Neo4j

用正则表达式爬取古诗词网

古诗词网爬虫实现

模拟登录古诗词网

爬取古诗文网古诗词

正则表达式_爬取中国古诗词网与豆瓣热门图书

使用requests登陆古诗词网

python爬取古诗文网

python爬虫爬取诗词名句网

Python GUI项目：古诗词鉴赏系统

python构建带数字的古诗词数据集

scrapy爬古诗词

python爬取古诗文网站诗文一栏的所有诗词

Python爬虫——爬取古诗文网

Python爬虫实战(基础篇)—4获取古诗词给孩子学习(附完整代码)

古诗词

双11之前想脱单，零基础自学python写彩色古诗词被拒！

python小白学习记录结合scrapy编写爬虫爬取古诗文网右侧的标签

Python 爬取诗词分析古人最喜欢用的诗词

python诗词名句网爬取《三国演义》

爬虫15-正则表达式爬取中国诗词网

【Python3 爬虫】U20_正则表达式爬取古诗文网

通过使用Python的Requests和BeautifulSoup库，编写爬虫程序来抓取古诗词并将其保存在文本文件中

python 爬取古诗文存入mysql

今日推荐

Linus “吃狗粮”最积极！

开源日报 | Winamp播放器即将开源；生成式AI之战升级第二轮；Linus“吃狗粮”最积极；AI进入泡沫前期；吴泳铭为阿里云带来了什么？

NetBSD 禁止提交由 AI 生成的代码

Apache Doris 2.0.10 版本正式发布！

开源日报 | 大模型开战；大模型独角兽被曝卖身；周鸿祎建议谷歌开源所有产品；最大开源AI社区提供1000万美元共享GPU

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

周排行

SVN服务端安装在阿里云

实战 | 相机标定

webpack核心概念

note20——》只要肯低头吃苦，人生就会有救

PAT甲级 1062 Talent and Virtue （25 分）排序

NG Toolset开发笔记--5GNR Resource Grid（26）

如何对待上司

oracle命令

第9章 STL迭代器

logstash使用es映射模板

每日归档

更多

2024-05-20(36)

2024-05-19(0)

2024-05-18(4)

2024-05-17(34)

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)