scrapy installation environment
- scrapy, pyspider
- what is the framework for how learning framework??
- that integrates a variety of functions and has a strong versatility (can be applied in a variety of different needs in) a project template.
- We just need to learn the framework encapsulated in related functions can be used
- scrapy which integrates the functions:
- High-performance data analysis operations, persistent storage operation, high-performance data downloading operation .....
Linux installation:
pip3 install scrapy
- Install whindows environment:
. A PIP3 install Wheel
b. Download twisted http://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted
c. enter the download directory, execute pip3 install Twisted-17.1.0-cp35-cp35m-win_amd64.whl # .whl files necessary for installation tools wheel, so to download the wheel. .whl file in which the URL b
d. pip3 install pywin32
e. pip3 install scrapy
Reptile generate the specified file name
It gives reptiles file we created, which creates a class, file name plus the spider to the class name. Inheritance is a module point reptiles
Start urls can put multiple url, domain names allowed to do so only limited access to this domain. Because we do is crawling, crawling lot of links, usually the exclusive domain allowed. Here there is a resolution method
- the implementation of the project: scrapy crawl spiderName
In response to two data separately url requests, each request the parse method invocation request in response inside.
Before crawling to request the file to see if we have permission to crawl