python爬虫——爬取链节点区块链社区所有帖子标题和链接，整理成json文件并保存 - Code World

python爬虫——爬取链节点区块链社区所有帖子标题和链接，整理成json文件并保存

Others 2021-11-28 06:08:52 views: null

import requests
from lxml import etree
import json


class BtcSpider():
    def __init__(self):
        self.base_url = "https://www.chainnode.com/forum/61-"
        self.headers = {
    
    
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36"
        }
        self.data_list = []

    # 1.发送请求
    def get_response(self, url):
        response = requests.get(url, headers=self.headers)
        data = response.content.decode("utf-8")
        return data

    # 2.解析请求
    def parse_data(self, data):
        x_data = etree.HTML(data)
        a_title_list = x_data.xpath('//a[@class="link-dark-major font-bold bbt-block"]/text()')
        title_list = []
        for i in a_title_list:
            # 清楚前后空格
            title_list.append(i.strip())
        a_url_list = x_data.xpath('//a[@class="link-dark-major font-bold bbt-block"]/@href')
        url_list = []
        url = "https://www.chainnode.com"
        for i in a_url_list:
            url_list.append(url+i)
        for index, title in enumerate(title_list):
            news = {
    
    }
            news['name'] = title
            news['url'] = url_list[index]
            self.data_list.append(news)

    # 3.保存数据
    def save_data(self):
        data_str = json.dumps(self.data_list, ensure_ascii=False)
        with open('001.json', 'w', encoding="utf-8")as f:
            f.write(data_str)

    # 4.启动
    def run(self):
        for i in range(1000):
            url = self.base_url+str(i)
            print(url)
            data = self.get_response(url)
            self.parse_data(data)
            self.save_data()


BtcSpider().run()

Guess you like

Origin blog.csdn.net/weixin_43912367/article/details/105001687

python爬虫——爬取链节点区块链社区所有帖子标题和链接，整理成json文件并保存

python爬虫——爬取链节点区块链社区所有帖子标题和链接，整理成json文件并保存

python爬虫——爬取链节点区块链社区所有帖子标题和链接，整理成json文件并保存

python爬取微博话题下面的帖子并存入excel文件

Python网络爬虫与信息提取（14）—— 百度搜索关键字爬取并整理摘要、标题、关键字等

python爬虫详解（二）——爬取bilibili网页排名、视频、播放量、点赞量、链接等内容并存储csv文件中

区块链知识整理之系统架构及各层功能

区块链知识汇总练习——私有链、公有链、联盟链、元宇宙、AR技术

区块链联盟链有哪些联盟链系统开发

广域节点机制预言机ADAMoracle区块链链外交互新方式

2021最新区块链游戏中国最新区块链游戏有哪些

python爬虫详解（三）——爬取世界常用密码并保存到字典内

python爬取华为商城所有的手机参数

1026 区块链

【转载】区块链-概述

比特币区块链

区块链分类

区块链学习【一】

区块链简单demo

区块链复习

区块链技术

【Python爬虫实战】1.爬取A股上市公司年报链接并存入Excel

NFT区块链是什么？有哪些协议

BSN区块链服务网络中密钥托管模式和公钥上传模式有啥区别？

MateMask连接本地私有链节点ganache

牧牛商学院，区块链的创新应用和发展

JAVA开发（神乎其神的区块链概念和技术）

适用于物联网数据共享的区块链节点存储优化方案

【行研报告】2021年全球区块链调查：数字资产新时代—附下载链接

区块链金融理论测试【图片版】「区块链知识」

Recommended

LFOSSA Yuanlaisusu Open Course | Mastering the Cloud Native Future: Comprehensive Guide to CNCF Certification and Exam Preparation Tips

Ranking

C++ Basic Syntax

bootstrapTable hides a column based on a condition

Why is reentrant lock recommended instead of Synchronized when dynamic high concurrency?

hexo create a blog

[Fully open source and non-encrypted version] Imitation of the eighth district distribution/online signature/multiple sets of download templates/APP distribution hosting/APP packaging and packaging

Polymerization combination

https://www.flysnow.org/2017/05/06/go-in-action-go-log.html

From the perspective of Flutter and the front-end, talk about how to ensure UI fluency under the single-threaded model

nginx-301, 302 redirect

Geolocation by IP Address in ASP.NET

Daily

More

2024-04-26(22)

2024-04-25(32)

2024-04-24(30)

2024-04-23(30)

2024-04-22(5)

2024-04-21(0)

2024-04-20(6)

2024-04-19(5)

2024-04-18(0)

2024-04-17(31)