Basic usage of scrapy command line

1. Create a new project:

scrapy startproject myproject

2. Create a new spider file in the new project:

scrapy genspider mydomain mydomain.com

mydomain is the spider file name, mydomain.com is the domain name of the crawling website

3. Global commands:

startproject
genspider
settings
runspider
shell
fetch
view
version

4. Commands used only in the project (local commands):

crawl
check
list
edit
parse
bench

5. Run the spider file:

scrapy crawl <spider>

5.1 Running the spider file does not display the log

scrapy crawl <spider> --nolog

6. Check the spider file for syntax errors:

scrapy check

7. List the spider files under the spider path:

scrapy list

8. Edit the spider file:

scrapy edit <spider>

It is equivalent to turning on vim mode, which is actually not easy to use, and editing in the IDE is more suitable.

9. Download the content of the web page, and then print the currently returned content in the terminal, which is equivalent to the request and urllib methods:

scrapy fetch <url>

10. Save the content of the webpage, and open the current webpage content in the browser to visually present the content of the webpage to be crawled: 

scrapy view <url>

11. Open the scrapy display, similar to ipython, can be used for testing:

scrapy shell [url]

12. Output formatted content:

scrapy parse <url> [options]

13. Return to system setting information:

scrapy settings [options]

Such as:

$ scrapy settings --get BOT_NAME
scrapybot

 14. Run the spider:

scrapy runspider <spider_file.py>

15. Display the scrapy version:

scrapy version [-v]

Add -v later to display the version of scrapy dependent library

16. Test the current crawling speed performance of the computer:

scrapy bench
Published 210 original articles · praised 37 · 170,000 views +

Guess you like

Origin blog.csdn.net/u012757419/article/details/103787224