Pyspider is a powerful web crawler framework written by Chinese binux. It has a powerful WebUI, script editor, task monitor, project manager and result processor. It supports multiple database backends, multiple message queues, and also It supports crawling of pages rendered by JavaScript, which is very convenient to use. This section introduces its installation process.
1. Related Links
- Official documentation: http://docs.pyspider.org/
- PyPI:https://pypi.python.org/pypi/pyspider
- GitHub:https://github.com/binux/pyspider
- Official tutorial: http://docs.pyspider.org/en/latest/tutorial
- Online example: http://demo.pyspider.org
2. Preparations
pyspider supports JavaScript rendering, and this process depends on PhantomJS, so PhantomJS needs to be installed (see section 1.2.5 for the specific installation process).
3. pip install
It is recommended to use pip to install, the command is as follows:
pip3 install pyspider
After the command is executed, the installation is completed.
4. Common mistakes
This error message may appear under Windows:
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-vXo1W3/pycurl
This is a PyCurl installation error, the PyCurl library needs to be installed at this point. Find the corresponding Python version from http://www.lfd.uci.edu/~gohlke/pythonlibs/#pycurl , and then download the corresponding wheel file. For example, Windows 64-bit, Python 3.6, you need to download pycurl‑7.43.0‑cp36‑cp36m‑win_amd64.whl, and then install it with pip. The command is as follows:
pip3 install pycurl‑7.43.0‑cp36‑cp36m‑win_amd64.whl
If you encounter PyCurl errors under Linux, you can refer to this article: https://imlonghao.com/19.html .
5. Verify the installation
After the installation is complete, you can start pyspider directly from the command line:
pyspider all
At this point, the console will have output similar to the one shown in Figure 1.
Figure 1 Console
At this time, the web service of pyspider will run on the local port 5000. Open http://localhost:5000/ directly in the browser to enter the WebUI management page of pyspider, as shown in Figure 2, which proves that pyspider has been successfully installed.