For the primary crawler, we can use the urllib and urllib2 libraries and regular expressions to complete it, but there is a more powerful tool, the crawler framework Scrapy, the installation process is also painstaking, and it is organized as follows.
Windows platform:
My system is Win7. First of all, you must have Python. I use version 2.7.7. Python3 is similar, but some source files are different.
Official website document: http://doc.scrapy.org/en/latest/intro/install.html , the most authoritative, the following is my personal experience process.
1. Install Python
I won’t say much about the installation process. Python 2.7.7 is already installed on my computer. After installation, remember to configure the environment variables. add the path to the Path variable
D:\python2.7.7;D:\python2.7.7\Scripts
After the configuration is complete, enter python –version on the command line. If there is no error, the installation is successful
2. Install pywin32
Under windows, pywin32 must be installed, the installation address: http://sourceforge.net/projects/pywin32/
Download the corresponding version of pywin32, double-click to install it, and verify after installation:
Enter at the python command line
import win32com
If there is no error, the installation is successful
3. Install pip
pip is a tool used to install other necessary packages, first download get-pip.py
After downloading, select the path where the file is located and execute the following command
python get-pip.py
After executing the command, pip will be installed, and at the same time, it will help you install setuptools
After the installation is complete, execute it on the command line
pip --version
If the prompt is as follows, the installation is successful. If the prompt is not an internal or external command, then check whether the environment variable is configured. There are two paths.
4. Install pyOPENSSL
Under Windows, pyOPENSSL is not pre-installed, but it is already installed under Linux.
Installation address: https://launchpad.net/pyopenssl
5. Install lxml
For a detailed introduction to lxml , it is a library written in Python that can process XML quickly and flexibly
Execute the following command directly
pip install lxml
The installation can be completed. If it prompts that the Microsoft Visual C++ library is not installed, click me to download the supported library.
6. Install Scrapy
Finally is the exciting time, the above foreshadowing is done, we can finally enjoy the fruits of victory!
Execute the following command
pip install Scrapy
pip will additionally download other dependent packages, we don't need to install these manually, wait for a while, and you're done!
7. Verify the installation
Enter Scrapy
If the following command is prompted, the installation is successful. If it fails, please check for any omissions in the above steps.
Linux Ubuntu Flatbed:
Installation under Linux is very simple, only need to execute a few commands
1. Install Python
sudo apt-get install python2.7 python2.7-dev
2. Install pip
First download get-pip.py
After downloading, select the path where the file is located and execute the following command
sudo python get-pip.py
3. Install Scrapy directly
Since lxml and OPENSSL are pre-installed under Linux
If you want to verify lxml, you can enter separately
sudo pip install lxml
The following prompt appears, which proves that the installation has been successful
Usage: scrapy <command> [options] [args] Available commands: bench Run quick benchmark test fetch Fetch a URL using the Scrapy downloader runspider Run a self-contained spider (without creating a project) settings Get settings values shell Interactive scraping console startproject Create new project version Print Scrapy version view Open URL in browser, as seen by Scrapy [ more ] More commands available when run from project directory
Screenshot below
If you have any questions, please leave a message! I wish you guys a smooth installation!