scrapy engineering tools to create crawler

1, scrapy create a crawler works: scrapy startproject scrape_project_name

>scrapy startproject books_scrape
New Scrapy project 'books_scrape', using template directory 's:\\users\\jiangshan\\anaconda3\\lib\\site-packages\\scrapy\\templates\\project', created in:
D:\Workspace\ScrapyTest\books_scrape

You can start your first spider with:
cd books_scrape
scrapy genspider example example.com

2、>cd books_scrape

3, see the directory structure:> tree / F

> Tree / F.
Volumes DATA1 folder list PATH
roll-EB05 Serial No. 3A2E
D :.
│ scrapy.cfg

└─books_scrape
│ items.py
│ middlewares.py
│ pipelines.py
│ the settings.py
│ __init__.py

├─spiders
│ │ __init__.py
│ │
│ └─__pycache__
└─__pycache__

4, scrapy genspider < SPIDER_NAME > < DOMAIN > command generates (based on templates) and create files and Spider Spider class, two parameters for the command are the name of Spider and to be crawling domain (website) .

> scrapy genspider books  books.toscrape.com

5, view the directory structure of the first matter :( marked blue)

>tree /F

D:.
│ scrapy.cfg

└─books_scrape
│ items.py
│ middlewares.py
│ pipelines.py
│ run.py
│ settings.py
│ __init__.py

├─.idea
│ books_scrape.iml
│ deployment.xml
│ misc.xml
│ modules.xml
│ remote-mappings.xml
│ workspace.xml

├─spiders
│ │ books.py
│ │ __init__.py
│ │
│ └─__pycache__
│ __init__.cpython-37.pyc

└─__pycache__
settings.cpython-37.pyc
__init__.cpython-37.pyc

6, open pycharm software, open created books_scrape works to the configuration file as a reference scrapy.cfg

7, in the same directory and ├─spiders new, run.py file, write:

from scrapy import cmdline
cmdline.execute('scrapy crawl books'.split())

 

Guess you like

Origin www.cnblogs.com/jeshy/p/11105766.html