scripy project creation

A: mounting frame
Upgrade Version match: pip install --upgrade pip
by pip pip install scrapy mounting frame Scrapy

II: Reptile steps:
① new project (on the command line to enter the directory you want to project documents placed)
scrapy startproject project name
Example: scrapy startproject douban

② create a custom name crawler
1> if it is operating in shell command
is first to enter the project file, and then enter the following command: scrapy genspider reptiles were crawling gamut (crawler name can not be the same as the project name)
Example: scrapy genspider douban_spider movie.douban.com

2> if it is operating in pycharm the terminal in the
first to the next item in the spiders package, enter the following command: scrapy genspider reptiles were crawling gamut (crawler name can not be the same as the project name)
Example: scrapy genspider douban_spider movie .douban.com

3> setting set

 1. ROBOTSTXT_OBEY = True to False

 2. Open the pipeline:

  ITEM_PIPELINES = {
  'douban.pipelines.DoubanPipeline': 300,
  }

 3. Open the settings and change:

  DEFAULT_REQUEST_HEADERS = {

  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  'Accept-Language': 'en',
  'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
  }

 

③ clear objectives (written items.py): clear goals that you want to crawl

④ Production reptile (spider / xxspider.py): Production start crawling reptiles page

⑤ stored content (pipelines.py): where duct storing content crawl

⑥ start reptile project
scrapy crawl reptile name
Example: scrapy crawl douban_spider

 

Guess you like

Origin www.cnblogs.com/lnd-blog/p/11692501.html