Original link: https://www.cnblogs.com/shuimohei/p/10495900.html
1.mysql database 2.mongoDB database 3.redis database 1. Create a project scrapy startproject myproject cd myproject 2. Create a reptile scrapy genspider myspider www.baidu.com scrapy genspider -t crawl myspider www.baidu.com ---- have created rules to configure 3. Run reptiles scrapy crawl myspider 4. Error Checking scrapy check ---- check syntax errors reptiles 5. List reptiles scrapy list - returns the name of the project in the spider 6. Test page scrapy fetch www.baidu.com scrapy fetch --nolog www.baidu.com ---- does not generate log scrapy fetch --nolog --headers www.baidu.com --输出headers scrapy fetch --nolog --no-redirect --- not redirect 7. Request Web page to save the page source code into the file, open (debug tool) browser scrapy view http://www.baidu.com 8. The command line interactive mode shell scrapy shell http://www.baidu.com Web page request --- request response-- request pages to return results response.text request result response.headers--headers view (response) --- open the returned results on the page (if they can be displayed, indicating that the static pages, can directly crawl, if not the display, indicating that Ajax loaded pages) response.xpath ( "") - with parsing xpath page 9. parse the contents of the page scrapy parse http://www.baidu.com -c parse parse method is --- front, rear callback method is invoked parse parse 10. The configuration information scrapy settings --get MONGO_URL --- configuration information 11. Run the spider file scrapy runspider myspider.py --- run directly myspider file (parameter is the file name) 12. Output version scrapy version scrapy version -v --- output version of dependent libraries 13. Test scrapy bench --- test crawling speed, reflect the current operating performance