A: mounting frame
Upgrade Version match: pip install --upgrade pip
by pip pip install scrapy mounting frame Scrapy
II: Reptile steps:
① new project (on the command line to enter the directory you want to project documents placed)
scrapy startproject project name
Example: scrapy startproject douban
② create a custom name crawler
1> if it is operating in shell command
is first to enter the project file, and then enter the following command: scrapy genspider reptiles were crawling gamut (crawler name can not be the same as the project name)
Example: scrapy genspider douban_spider movie.douban.com
2> if it is operating in pycharm the terminal in the
first to the next item in the spiders package, enter the following command: scrapy genspider reptiles were crawling gamut (crawler name can not be the same as the project name)
Example: scrapy genspider douban_spider movie .douban.com
3> setting set
1. ROBOTSTXT_OBEY = True to False
2. Open the pipeline:
ITEM_PIPELINES = {
'douban.pipelines.DoubanPipeline': 300,
}
3. Open the settings and change:
DEFAULT_REQUEST_HEADERS = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en',
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
}
③ clear objectives (written items.py): clear goals that you want to crawl
④ Production reptile (spider / xxspider.py): Production start crawling reptiles page
⑤ stored content (pipelines.py): where duct storing content crawl
⑥ start reptile project
scrapy crawl reptile name
Example: scrapy crawl douban_spider