Reptiles --Scrapy common command line operation (rpm)

Original link: https://www.cnblogs.com/shuimohei/p/10495900.html

1.mysql database
2.mongoDB database
3.redis database
 
 
1. Create a project
    scrapy startproject myproject
    cd myproject
 
2. Create a reptile
    scrapy genspider myspider www.baidu.com
    scrapy genspider -t crawl myspider www.baidu.com ---- have created rules to configure
 
3. Run reptiles
    scrapy crawl myspider
 
4. Error Checking
    scrapy check ---- check syntax errors reptiles
 
5. List reptiles
    scrapy list - returns the name of the project in the spider
6. Test page
    scrapy fetch www.baidu.com
    scrapy fetch --nolog www.baidu.com ---- does not generate log
    scrapy fetch --nolog --headers www.baidu.com    --输出headers
    scrapy fetch --nolog --no-redirect --- not redirect
7. Request Web page to save the page source code into the file, open (debug tool) browser
    scrapy view http://www.baidu.com
 
8. The command line interactive mode shell
    scrapy shell http://www.baidu.com
    Web page request --- request
    response-- request pages to return results
    response.text request result
    response.headers--headers
    view (response) --- open the returned results on the page (if they can be displayed, indicating that the static pages, can directly crawl, if not the display, indicating that Ajax loaded pages)
    response.xpath ( "") - with parsing xpath page
9. parse the contents of the page
    scrapy parse http://www.baidu.com -c parse parse method is --- front, rear callback method is invoked parse parse
10. The configuration information
    scrapy settings --get MONGO_URL --- configuration information
11. Run the spider file
    scrapy runspider myspider.py --- run directly myspider file (parameter is the file name)
12. Output version
    scrapy version
    scrapy version -v --- output version of dependent libraries
13. Test
    scrapy bench --- test crawling speed, reflect the current operating performance

 

Guess you like

Origin www.cnblogs.com/lymlike/p/11598508.html