Python3 crawler mounted combat -11 crawler frame: ScrapySplash, ScrapyRedis

ScrapySplash installation

ScrapySplash is a tool for JavaScript rendering of a Scrapy in support of this section to tell us about its installation.
ScrapySplash installation is divided into two parts, one is to install the Splash service, installation by Docker, Splash will start a service after the installation, we can achieve load a page through its JavaScript interface. Another is to install ScrapySplash Python library, you can use the service in Scrapy Splash in after installation.

1. Links

2. Install Splash

ScrapySplash Splash will use the HTTP API for page rendering, so we need to install the Splash to provide rendering services, installation is installed by Docker, before that make sure you have properly installed the Docker.
Installation command as follows:

docker run -p 8050:8050 scrapinghub/splash

The output will be similar to the results after installation is complete:

2017-07-03 08:53:28+0000 [-] Log opened.
2017-07-03 08:53:28.447291 [-] Splash version: 3.0
2017-07-03 08:53:28.452698 [-] Qt 5.9.1, PyQt 5.9, WebKit 602.1, sip 4.19.3, Twisted 16.1.1, Lua 5.2
2017-07-03 08:53:28.453120 [-] Python 3.5.2 (default, Nov 17 2016, 17:05:23) [GCC 5.4.0 20160609]
2017-07-03 08:53:28.453676 [-] Open files limit: 1048576
2017-07-03 08:53:28.454258 [-] Can't bump open files limit
2017-07-03 08:53:28.571306 [-] Xvfb is started: ['Xvfb', ':1599197258', '-screen', '0', '1024x768x24', '-nolisten', 'tcp']
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
2017-07-03 08:53:29.041973 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles
2017-07-03 08:53:29.315445 [-] verbosity=1
2017-07-03 08:53:29.315629 [-] slots=50
2017-07-03 08:53:29.315712 [-] argument_cache_max_entries=500
2017-07-03 08:53:29.316564 [-] Web UI: enabled, Lua: enabled (sandbox: enabled)
2017-07-03 08:53:29.317614 [-] Site starting on 8050
2017-07-03 08:53:29.317801 [-] Starting factory <twisted.web.server.Site object at 0x7ffaa4a98cf8>
Python资源分享qun 784758214 ,内有安装包,PDF,学习视频,这里是Python学习者的聚集地,零基础,进阶,都欢迎

This will prove Splash 8050 is already running on the port.
Then we open:HTTP: // localhost : 8050 Splash can see the home page, shown in Figure 1-81:

Python3 crawler mounted combat -11 crawler frame: ScrapySplash, ScrapyRedis

Figure 1-81 Run page
of course Splash also be mounted directly on a remote server, we run in order to protect state Splash can run commands on the server as follows:

docker run -d -p 8050:8050 scrapinghub/splash

Here one more -d parameter, which represents the Docker container in order to protect state operation, so that the connection does not interrupt the remote server running Splash termination services.

3. ScrapySplash installation

After a successful installation of the Splash, we are going to look at it again to install Python libraries, install command as follows:

pip3 install scrapy-splash

After the command finishes running will be successfully installed this library, later we will introduce its detailed usage.

ScrapyRedis installation

ScrapyRedis is Scrapy distributed expansion module, with which we can easily implement to build Scrapy distributed crawler, this section to tell us about ScrapyRedis installation.

1. Links

2. Pip installation

Pip recommended installation order is as follows:

pip3 install scrapy-redis

3. Test the installation

After the installation is complete, you can test at the Python command line.

$ python3
>>> import scrapy_redis
Python资源分享qun 784758214 ,内有安装包,PDF,学习视频,这里是Python学习者的聚集地,零基础,进阶,都欢迎

If no error is reported, then it proves that libraries have been installed.

Guess you like

Origin blog.51cto.com/14445003/2425406