Both versions of python reptile scrapy

"" " 
    Scrapy initial Url of two way, 
    one is constant start_urls, and the need to define a method parse () 
    Another method is to directly define a: star_requests () 
" "" 
Import Scrapy 
class simpleUrl (scrapy.Spider): 
    = name "simpleUrl" 
    start_urls = [# another writing, no need to define a method start_requests 
        ' http://lab.scrapyd.cn/page/1/ ', 
        ' http://lab.scrapyd.cn/page/2/ ' 
    ] 

    # writing another initial link 
    # DEF start_requests (Self): 
    # URLs = [# link crawling through this method in the link page crawling 
    #' http://lab.scrapyd.cn/page/1 / ', 
    #' http://lab.scrapyd.cn/page/2/ ', 
    #] 
    # for url in urls: 
    # Scrapy the yield.Request(url=url, callback=self.parse)
    # If the initial short url, method name must be: the parse 

    DEF the parse (Self, Response): 
        Page = response.url.split ( "/") [- 2] 
        filename = 'mingyan-% s.html' Page% 
        Open with (filename, 'WB') AS F: 
            f.write (response.body) 
        self.log ( 'save file:% s'% filename)

Guess you like

Origin www.cnblogs.com/stillstep/p/11099809.html