scrapy - Request 中的回调函数不执行or只执行一次 - 代码天地

scrapy - Request 中的回调函数不执行or只执行一次

其他 2018-10-27 08:55:54 阅读次数: 0

版权声明：如需转载本文章，请保留出处！ https://blog.csdn.net/xc_zhou/article/details/82839374

在 scrapy 中，

scrapy.Request(url, headers=self.header, callback=self.parse)

调试的时候，发现回调函数 parse 没有被调用，这可能就是被过滤掉了，查看 scrapy 的输出日志 offsite/filtered 会显示过滤的数目。这个问题如何解决呢，查看手册发现(https://doc.scrapy.org/en/latest/faq.html?highlight=offsite%2Ffiltered)这个问题，这些日志信息都是由 scrapy 中的一个 middleware 抛出的，如果没有自定义，那么这个 middleware 就是默认的 Offsite Spider Middleware，它的目的就是过滤掉那些不在 allowed_domains 列表中的请求 requests。

再次查看手册中关于 OffsiteMiddleware 的部分(https://doc.scrapy.org/en/latest/topics/spider-middleware.html#scrapy.spidermiddlewares.offsite.OffsiteMiddleware)

两种方法能够使 requests 不被过滤:

1. 在 allowed_domains 中加入 url
2. 在 scrapy.Request() 函数中将参数 dont_filter=True 设置为 True

如下摘自手册

If the spider doesn’t define an allowed_domains attribute, or the attribute is empty, the offsite middleware will allow all requests.

If the request has the dont_filter attribute set, the offsite middleware will allow the request even if its domain is not listed in allowed domains

猜你喜欢

转载自blog.csdn.net/xc_zhou/article/details/82839374

scrapy - Request 中的回调函数不执行or只执行一次

Scrapy-Request中的回调函数不执行

Scrapy框架: Request回调函数

Scrapy中的Request和Response

python——scrapy中Request参数

scrapy中的headers，Request，response

scrapy-yield scrapy.Request()不执行、失效、Filtered offsite request to错误 [转]

关于scrapy中scrapy.Request中的属性

scrapy中Request请求使用Request payload参数

Scrapy中scrapy.Request和response.follow的区别

scrapy 中request常用属性与参数

关于scrapy中request过滤问题

Scrapy 中的 Request 对象和 Respionse 对象

scrapy中的Request和Response对象

scrapy中request与response对象属性介绍

scrapy中如何设置request的重试次数

爬虫中scrapy.Request的更多参数

使用scrapy做爬虫遇到的一些坑：调试成功但是没有办法输出想要的结果（request的回调函数不执行）（url去重）dont_filter=True

Scrapy之Request函数回调未执行解决方案

Scrapy学习-3-Request回调巧用

[转]scrapy中的request.meta [转]scrapy中的request.meta

scrapy框架之request

scrapy框架-- request

Scrapy框架----- Request/Response

scrapy之Request对象

Scrapy_request&response

Scrapy源码 Request对象

scrapy的使用-Request

scrapy Request方法

Scrapy Request重试

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

Java自定义时间格式

同步整形电路

在开发中最最最常用的字符串的属性大集合

Linux 查看端口占用并杀掉

Java基础四：ArrayList

多线程之死锁就是这么简单

mysql 基础命令集

awk 命令详解

Centos6.3编译安装nginx+php步骤

OCR （Optical Character Recognition，光学字符识别）

每日归档

更多

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)