scrapy 爬取起点 - 代码天地

scrapy 爬取起点

其他 2018-12-22 19:09:55 阅读次数: 0

# coding=utf-8
import scrapy
from qidian.items import QidianItem


class QiDian(scrapy.Spider):
    name = 'qidian_spider'
    start_urls = ['https://www.qidian.com/all']

    def parse(self, response):
        '''对网页发起请求,请求当前页面后,请求剩余页面'''
        yield scrapy.Request(response.url, callback=self.item_parse)
        item_links = ['https://www.qidian.com/all?orderId=&style=1&pageSize=20&siteid=1&pubflag=0&hiddenField=0&page={}'.format(str(i)) for i in range(2, 5)]
        for item_link in item_links:
            yield scrapy.Request(item_link, callback=self.item_parse)

    def item_parse(self, response):
        '''解析出网页中每本书的url'''
        urls = response.xpath('//*[@class="all-img-list cf"]/li/div/h4/a/@href').extract()
        for url in urls:
            yield scrapy.Request(response.urljoin(url), callback=self.book_parse)

    def book_parse(self, response):
        '''爬取数据'''
        item = QidianItem()
        book_name = response.xpath('//*[@class="book-info "]/h1/em/text()').extract()[0]
        Author = response.xpath('//*[@class="book-info "]/h1/span/a/text()').extract()[0]
        item['book_name'] = book_name
        item['Author'] = Author
        yield item

猜你喜欢

转载自blog.csdn.net/qq_18525247/article/details/82287368

scrapy 爬取起点

使用scrapy爬虫,爬取起点小说网的案例

Scrapy 学习笔记 - 一个练手任务，爬取起点的全部小说名

使用python3.7中的scrapy框架，爬取起点小说

scrapy爬取图片

scrapy 爬取图片

scrapy爬取jobbole

Scrapy爬取豆瓣

使用scrapy爬取

scrapy 爬取小说

scrapy爬取京东

爬取股票scrapy

scrapy 爬取视频

scrapy增量爬取

scrapy多层爬取

scrapy爬取小说

Scrapy 爬取起点中文网存储到 MySQL 数据库（自定义 middleware）

一周搞定scrapy之第一天--爬取起点中文小说网

scrapy 爬取写入MongoDB

scrapy(3)爬取图片

Scrapy爬取人人网

使用scrapy爬取网站

scrapy爬取趣头条

scrapy 爬取京东例子

Scrapy爬取图片教程

Scrapy框架：爬取博客

scrapy爬取豆瓣电影

scrapy爬取动态数据

scrapy爬取深度设置

Scrapy爬取静态页面

今日推荐

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

周排行

[编程题]学英语

[codeforces 1288A] Deadline 约数+模

Python的web开发

Docker在Centos 7上的部署

python编码

解决Ubuntu16.04 fatal error: json/json.h: No such file or directory

mysql并发插入

rest接口如何适应jsonp的方案

linux 终端上网设置

高数——等号两边同时求导、积分的解释

每日归档

更多

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)