Sesame HTTP: Scrapy framework installation configuration for advanced Python crawler

For the primary crawler, we can use the urllib and urllib2 libraries and regular expressions to complete it, but there is a more powerful tool, the crawler framework Scrapy, the installation process is also painstaking, and it is organized as follows.

Windows platform:

My system is Win7. First of all, you must have Python. I use version 2.7.7. Python3 is similar, but some source files are different.

Official website document: http://doc.scrapy.org/en/latest/intro/install.html , the most authoritative, the following is my personal experience process.

1. Install Python

I won’t say much about the installation process. Python 2.7.7 is already installed on my computer. After installation, remember to configure the environment variables. add the path to the Path variable

D:\python2.7.7;D:\python2.7.7\Scripts

 After the configuration is complete, enter python –version on the command line. If there is no error, the installation is successful

QQ screenshot 20150211171953

2. Install pywin32

Under windows, pywin32 must be installed, the installation address: http://sourceforge.net/projects/pywin32/

Download the corresponding version of pywin32, double-click to install it, and verify after installation:

QQ screenshot 20150211171713

Enter at the python command line

import win32com

If there is no error, the installation is successful

3. Install pip

pip is a tool used to install other necessary packages, first download  get-pip.py

After downloading, select the path where the file is located and execute the following command

​python get-pip.py

 After executing the command, pip will be installed, and at the same time, it will help you install setuptools

After the installation is complete, execute it on the command line

​pip --version

 If the prompt is as follows, the installation is successful. If the prompt is not an internal or external command, then check whether the environment variable is configured. There are two paths.

QQ screenshot 20150211171001

4. Install pyOPENSSL

Under Windows, pyOPENSSL is not pre-installed, but it is already installed under Linux.

Installation address: https://launchpad.net/pyopenssl

5. Install lxml

For a detailed introduction to lxml  ,  it is a library written in Python that can process XML quickly and flexibly

Execute the following command directly

​pip install lxml

 The installation can be completed. If it prompts that the Microsoft Visual C++ library is not installed,  click me  to download the supported library.

6. Install Scrapy

Finally is the exciting time, the above foreshadowing is done, we can finally enjoy the fruits of victory!

Execute the following command

pip install Scrapy

 QQ screenshot 20150211172637

pip will additionally download other dependent packages, we don't need to install these manually, wait for a while, and you're done!

7. Verify the installation

Enter Scrapy

If the following command is prompted, the installation is successful. If it fails, please check for any omissions in the above steps.

QQ screenshot 20150211172456

Linux Ubuntu Flatbed:

Installation under Linux is very simple, only need to execute a few commands

1. Install Python

​sudo apt-get install python2.7 python2.7-dev

 2. Install pip

First download  get-pip.py

After downloading, select the path where the file is located and execute the following command

sudo python get-pip.py

 3. Install Scrapy directly

Since lxml and OPENSSL are pre-installed under Linux

If you want to verify lxml, you can enter separately

sudo pip install lxml

 The following prompt appears, which proves that the installation has been successful



Usage:
 scrapy <command> [options] [args]
 
Available commands:
 bench Run quick benchmark test
 fetch Fetch a URL using the Scrapy downloader
 runspider Run a self-contained spider (without creating a project)
 settings Get settings values
 shell Interactive scraping console
 startproject Create new project
 version Print Scrapy version
 view Open URL in browser, as seen by Scrapy
 
 [ more ] More commands available when run from project directory

 Screenshot below

Screenshot of 2015-02-12 01:00:22

 

If you have any questions, please leave a message! I wish you guys a smooth installation!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326322079&siteId=291194637