Scrapy frame mounting
1, first, the terminal upgrade Run PIP: the install Python -m --upgrade PIP PIP
2, installation, Wheel (recommended network installation) the install Wheel PIP
. 3, installation, lxml (recommended download and install)
4, installed, the Twisted (recommended downloads installation)
5, the installation, Scrapy (recommended for network installation) pip install Scrapy
Scrapy test whether the installation is successful
Framework Directive Scrapy
scrapy -h view help
Commands the Available:
Bench the Run Quick Benchmark the Test (scrapy Bench hardware test instructions, you can test the current server can climb up to the number of pages per minute)
FETCH Fetch A at The Scrapy the URL of a using Downloader (scrapy FETCHhttp://www.iqiyi.com/ obtain a html page source)
genspider the using the Generate new new pre-defined Templates Spider ()
runspider the Run A Self-Contained Spider (the without Creating Project A) ()
Settings Settings values the Get ()
the shell Interactive Console Scraping ()
startproject the create new new project (cd into the directory you want to create the project, scrapy startproject project name, creation scrapy project)
Version Print Version Scrapy ()
View the URL of Open in Browser, AS Seen by Scrapy ()
Create a project and project description
scrapy startproject adc create project
project instruction
Directory structure is as follows:
├── firstCrawler
│ ├── init.py
│ ├── items.py
│ ├── middlewares.py
│ ├── pipelines.py
│ ├── settings.py
│ └── spiders
│ └── init.py
└── scrapy.cfg
scrapy.cfg
: The project configuration filestems.py
: Item in the project file for defining attributes or fields corresponding to the analysis target.pipelines.py
: Responsible for processing is extracted from the spider item. A typical cleaning process has verified and persistence (such as access to a database)settings.py
: Set the file for the project.- spiders: implement custom crawlers directory
- middlewares.py:Spider middleware specific hook (specific hook) between the engine and Spider, the input (response) spider and the output process (items and requests). Which provides a convenient mechanism to extend Scrapy function by inserting custom code.
Project instruction
Project instruction is required to enter the command cd directory project execution
scrapy -h command help project
Available commands:
bench Run quick benchmark test
check Check spider contracts
crawl Run a spider
edit Edit spider
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
list List available spiders
parse Parse URL (using its spider) and print the results
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version (scrapy version 查看scrapy版本信息)
view Open URL in browser, as seen by Scrapy (scrapy view http://www.zhimaruanjian.com/ download and open a web page)
Create a file reptiles
Create a file is to create a reptile reptiles file based scrapy master
scrapy genspider -l see scrapy create a crawler master file available
Available templates: Master explain
basic foundation to create a crawler file
crawl crawler automatically create a file
csvfeed create crawling reptiles csv file data
xmlfeed create crawling reptiles xml data file
Creating a basic master reptiles, other empathy
The domain name scrapy genspider -t master reptiles file name to create a basis for crawling reptiles master, the other the same way
as: scrapy genspider -t basic pach baidu.com
scrapy check the file name to test whether a reptile reptilian document compliance
such as: scrapy check pach
If you are still confused in the programming world, you can join us to learn Python buckle qun: 784758214, look at how seniors are learning. Exchange of experience. From basic web development python script to, reptiles, django, data mining and other projects to combat zero-based data are finishing. Given to every little python partner! Share some learning methods and need to pay attention to small details, click on Join us python learner gathering
scrapy crawl reptile reptiles name of the execution file, display the log [focus]
scrapy crawl reptile reptiles file name --nolog execution, does not display the log [focus]