scapy module

 

scrapy installation environment

- scrapy, pyspider
- what is the framework for how learning framework??
- that integrates a variety of functions and has a strong versatility (can be applied in a variety of different needs in) a project template.
- We just need to learn the framework encapsulated in related functions can be used


- scrapy which integrates the functions:
- High-performance data analysis operations, persistent storage operation, high-performance data downloading operation .....

Linux installation:

  pip3 install scrapy


- Install whindows environment:
. A PIP3 install Wheel

b. Download twisted http://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted

c. enter the download directory, execute pip3 install Twisted-17.1.0-cp35-cp35m-win_amd64.whl # .whl files necessary for installation tools wheel, so to download the wheel. .whl file in which the URL b

d. pip3 install pywin32

e. pip3 install scrapy

 

 

Reptile generate the specified file name

 

 

 

 

 It gives reptiles file we created, which creates a class, file name plus the spider to the class name. Inheritance is a module point reptiles

 

 

 Start urls can put multiple url, domain names allowed to do so only limited access to this domain. Because we do is crawling, crawling lot of links, usually the exclusive domain allowed. Here there is a resolution method

 

 

 - the implementation of the project: scrapy crawl spiderName

In response to two data separately url requests, each request the parse method invocation request in response inside.

 

 

 

 

 

Before crawling to request the file to see if we have permission to crawl

 

Guess you like

Origin www.cnblogs.com/machangwei-8/p/11502604.html