scrapy Request方法 - 代码天地

scrapy Request方法

其他 2020-03-19 10:34:08 阅读次数: 0

# -*- coding: utf-8 -*-
import scrapy


class TestSpider(scrapy.Spider):
    name = 'test'
    allowed_domains = ['yeves.cn']
    start_urls = ['https://yeves.cn/']
    base_domain = 'https://yeves.cn{}'  # 基础域名
    def parse(self, response):

        articles = response.xpath('//*[@id="article"]//div') # 获取首页的标题和链接


        for article in articles:
            title = article.xpath('./div/article/div/header/h2/a/text()').extract_first()
            href = article.xpath('./div/article/div/header/h2/a/@href').extract_first()
            if title is not None and href is not None:
                href = self.base_domain.format(href)
                yield scrapy.Request(href,callback=self.parse_detail,meta={"title":title})  #通过标题链接获取详情 把标题带过去

    def parse_detail(self,respone):
        print(respone.url)
        print(respone.meta.get('title'))
        detail = {}
        detail['title'] = respone.meta.get('title')

        created_at = respone.xpath('/html/body/section/div/div/header/div/span[1]/time/text()').extract_first() # 拿到详情数据
        category = respone.xpath('/html/body/section/div/div/header/div/span[2]/a/text()').extract_first()
        content = respone.xpath('/html/body/section/div/div/article//text()').extract_first()

        detail['created_at'] = created_at
        detail['category'] = category
        print(detail)
        yield detail

猜你喜欢

转载自www.cnblogs.com/php-linux/p/12522364.html

scrapy Request方法

scrapy重写Request方法

scrapy框架之request

Scrapy框架----- Request/Response

scrapy框架-- request

scrapy之Request对象

Scrapy_request&response

scrapy的使用-Request

Scrapy源码 Request对象

Scrapy Request重试

python scrapy.Request传递给parse参数的方法

如何使用 scrapy.Request.from_curl() 方法将 cURL 命令转换为 Scrapy 请求

笔记-scrapy-Request/Response

scrapy在Request之间传递参数

python——scrapy中Request参数

Scrapy中的Request和Response

scrapy中的headers，Request，response

python的scrapy框架：Attribute Error module ‘scrapy’ has no attribute ‘Request’（scrapy找不到request）

Scrapy中scrapy.Request和response.follow的区别

关于scrapy中scrapy.Request中的属性

scrapy中 Request方法中的meta参数是什么？干什么用的？

Scrapy 下载多层请求、多页图片(下载使用urllib.request.urlretrieve方法)

scrapy的调试方法

调试Scrapy方法

scrapy代理的配置方法

Scrapy添加IP的方法

安装Scrapy 方法

Scrapy debug方法

scrapy-pipeline的方法

scrapy安装方法

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

基本数据类型封装类比较 Java源码解读(一) 8种基本类型对应的封装类型

JS实现无缝滚动上

深入解析HashMap原理（基于JDK1.8）

mysql的连接池

关于.htc

linux下的ubuntu12.04图形界面

【数论】好推不好记的扩展欧几里德

设备树详解

cscope + tags 简单设置

xml学习

每日归档

更多

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)