[Reptile] study notes day58 7.scrapy-redis combat + from zero to build Redis-Scrapy distributed crawler + Scrapy-Redis distributed strategy + installation + Redis Redis + to modify the configuration database desktop management tools

7.scrapy-redis combat

Here Insert Picture Description

Redis-Scrapy from zero to build a distributed crawler

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-2O2kKXAs-1580311957944) (../ images / scrapy-redis.png)]

Scrapy-Redis distributed strategy:

Suppose there are four computers: Windows 10, Mac OS X, Ubuntu 16.04, CentOS 7.2, any computer can be used as Master Slaver end or ends, such as:

  • Master端(Core server): Using Windows 10, set up a Redis database, is not responsible for crawling, url fingerprint is only responsible for the heavy sentence, Request distribution, and storage of data
  • Slaver端(Crawlers execution side): Using Mac OS X, Ubuntu 16.04, CentOS 7.2, is responsible for implementing crawlers, during the operation to submit a new Request to Master

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-8xUVBU3Y-1580311957944) (../ images / redis.png)]

  1. First Slaver end to take the job (Request, url) from the Master terminal for data capture, while Slaver crawl data, Request a new task will be submitted to the Master process;
  2. Master database Redis only one end, is responsible for the weight and untreated Request to task allocation, to be added to the processed Request queue climbing, crawling and storing data.

Scrapy-Redis default is to use this strategy, we realize it is very simple, because the task scheduling and other work Scrapy-Redis have helped us do a good job, we only need to inherit RedisSpider, designated redis_key on the line.

缺点是,Scrapy-Redis调度的任务是Request对象,里面信息量比较大(不仅包含url,还有callback函数、headers等信息),可能导致的结果就是会降低爬虫速度、而且会占用Redis大量的存储空间,所以如果要保证效率,那么就需要一定硬件水平。

一、安装Redis

安装Redis:http://redis.io/download

安装完成后,拷贝一份Redis安装目录下的redis.conf到任意目录,建议保存到:/etc/redis/redis.conf (Windows系统可以无需变动)

二、修改配置文件 redis.conf

打开你的redis.conf配置文件,示例:

  • 非Windows系统: sudo vi /etc/redis/redis.conf
  • Windows系统:C:\Intel\Redis\conf\redis.conf
  1. Master端redis.conf里注释bind 127.0.0.1,Slave端才能远程连接到Master端的Redis数据库。

    [Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-FVhKQpFO-1580311957945) (../ images / master_redis.png)]

    • daemonize yno表示Redis默认不作为守护进程运行,即在运行redis-server /etc/redis/redis.conf时,将显示Redis启动提示画面;

      • daemonize yes则默认后台运行,不必重新启动新的终端窗口执行其他命令,看个人喜好和实际需要。

      [Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-Phgni876-1580311957945) (../ images / daemonize-redis.png)]

三、测试Slave端远程连接Master端

测试中,Master端Windows 10 的IP地址为:192.168.199.108

  1. Master端按指定配置文件启动 redis-server,示例:

    • 非Windows系统:sudo redis-server /etc/redis/redis/conf
    • Windows系统:命令提示符(管理员)模式下执行 redis-server C:\Intel\Redis\conf\redis.conf读取默认配置即可。
  2. Master端启动本地redis-cli

    [Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-1tnZrOsP-1580311957946) (../ images / redis-cli-master.png)]

  3. slave端启动redis-cli -h 192.168.199.108,-h 参数表示连接到指定主机的redis数据库

    [Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-dtjARCD9-1580311957947) (../ images / redis-cli-mac.png)]

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-6KzVbBIi-1580311957947)(../images/redis-cli-ubuntu.png)]

注意:Slave端无需启动redis-server,Master端启动即可。只要 Slave 端读取到了 Master 端的 Redis 数据库,则表示能够连接成功,可以实施分布式。

四、Redis数据库桌面管理工具

It is recommended Redis Desktop Manager, supports Windows, Mac OS X, Linux and other platforms:

Download: https: //redisdesktop.com/download

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-DrLW5y4A-1580311957948)(../images/redis-manager-first.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-sh57viD5-1580311957948)(../images/redis-reload-flush.png)]

Published 290 original articles · won praise 94 · views 10000 +

Guess you like

Origin blog.csdn.net/qq_35456045/article/details/104111462