08. runaway horse crawlspider

Based on the station data crawlspider crawling

  1.spider subclass

  2. Use Process

    Create a file-based CrawlSpider reptile scrapy genspider -t crawl spidername

import scrapy 
from scrapy.spider.crawl import CrawlSpider, Rule
fromm scrapy.linkextractors import LinkExtractor

class myspider(CrawlSpider):
    name = 'pra_crawlspider'
    start_urls = [ 'http://pic.netbian.com/ ]
    rules = [
         # Instantiate a Rule (Rule parser) objects
         Rule(LinkExtractor(restrict_xpaths='//div[@class="page"]), callback="parse_item", follow=True)             
    ]
    
    def parse_item( self, response ) : 
        imgs = response.xpath( '//div[@class="slist"]//img )
        for  img in imgs:
            print(img.xpaht('./@src')).extract_first()
            print(img.xpath('./@alt')).extract_first()

  After running the spider, start_url given access to the first page, the response object is returned to the parse methods res, url parse method extracts from the res according to rules specified in the rule continues to send the request, the response object passed to the callback res designated function to deal with.

Guess you like

Origin www.cnblogs.com/zhangjian0092/p/11704687.html