scrapy框架使用splash渲染引擎爬取动态页面

1.启动docker,在命令行里输入

docker run -p 8050:8050 scrapinghub/splash

在docker上运行splash引擎
2.接下来就可以来写爬虫文件了
首先在setting里配置

splash_url='http://loaclhost:8050'
DUPEFLITER='scrapy_splash.SplashAwareDupeFilter'

DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware':723,
    'scrapy_splash.SplashMiddleware':725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware':810
}

同时启用pileline
3.在写spider文件时,在开头加入

from scrapy_splash import SplashRequest

我们就使用SplashReqeust方法来将我们要解析的页面提交给splash引擎的

猜你喜欢

转载自blog.csdn.net/weixin_43434223/article/details/85414557