scrapy 中集成 selenium - 代码天地

scrapy 中集成 selenium

其他 2020-03-22 16:22:31 阅读次数: 0

因为在called for each request that goes through the downloader middleware。每一请求进过下载中间键时会被调用。在方法process_request 中集成。

from scrapy.http.response.html import HtmlResponse
from selenium import webdriver

class SeleniumSpiderDownloaderMiddleware(object):
    """继承selenium"""

    def __init__(self):
        option = webdriver.ChromeOptions()
        option.add_argument('User-Agent=Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50')
        option.add_experimental_option('excludeSwitches', ['enable-automation'])
        prefs = {
            "profile.managed_default_content_settings.images": 2
        }
        option.add_experimental_option('prefs',prefs) #设置不加载图片
        option.add_argument("--headless") # 设置无头

        self.driver = webdriver.Chrome(options=option)



    def process_request(self,request,spider):

        self.driver.get(request.url)
        time.sleep(1)
        source = self.driver.page_source
        if not source:
        	print("INFO: %s %s"%(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"),source)) 
        source =source.encode("utf-8")
        Response=HtmlResponse(url=self.driver.current_url,body=source,request=request,encoding="utf-8")
        return Response

go_flush

发布了127 篇原创文章 · 获赞 25 · 访问量 3万+

私信关注

猜你喜欢

转载自blog.csdn.net/weixin_44224529/article/details/103922355

Scrapy中集成selenium

scrapy 中集成 selenium

如何在scrapy中集成selenium爬取网页

Selenium集成至Scrapy

scrapy 集成 selenium

将selenium集成到scrapy框架中

【爬虫】selenium集成到scrapy中

scrapy集成selenium分布式爬虫---01

Scrapy爬虫框架集成Selenium来解析动态网页

scrapy--selenium

scrapy--selenium(二)

scrapy中selenium的应用

selenium在scrapy中的应用

scrapy使用selenium

Scrapy_selenium

Scrapy对接Selenium

在Scrapy中使用Selenium

Scrapy 对接selenium

scrapy笔记三（selenium）

Scrapy框架的使用之Scrapy对接Selenium

win7环境scrapy集成selenium爬取动态网页

Scrapy+Selenium---进阶用法

30.Scrapy 对接 Selenium

scrapy + selenium + phantom框架流程

Scrapy对接selenium+phantomjs

Scrapy+Chromium+代理+selenium

selenium +scrapy 实现网易新闻

Scrapy基于selenium结合爬取淘宝

Scrapy+PhantomJS+Selenium动态爬虫

Selenium+Scrapy爬取淘宝

今日推荐

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

【转】spring中对控制反转和依赖注入的理解

tms webcore 安装和使用

java程序员进阶相关书籍

SpringMVC接受请求参数、

如何保存训练好的机器学习模型

MyEclipse、Eclipse设置项目JDK的三个地方

商超行业微信小程序开发定制一般多少钱（行业技术人员解读）

Markdown编辑器语言——30分钟入门到到精通

Linux系统下MongoDB的简单安装与基本操作

Power Strings

每日归档

更多

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)