scrapy rewrites the Request method

I wanted to crawl the news of a website in batches. This page-turning url is too simple. Just replace and adjust the p parameter in the external link to
write it like this

urls = ['https://www.baidu/p=%s'%(i) for i in range(1,11)]

However, the parse() method that comes with it cannot use the response.follow() method to submit the next link after parsing, and then the next method is executed.

So I rewrite the Request method

import scrapy
from article.items import ArticleItem
from scrapy import Request

class XinwenSpider(scrapy.Spider):
    name = 'xinwen'
    allowed_domains = ['www.hbskzy.cn']



    def start_requests(self):
        urls = ['http://www.hbswkj.com/index_list.jsp?a1032t=44&a1032p=%s&a1032c=20&urltype=tree.TreeTempUrl&wbtreeid=1021'%(i) for i in range(2,5)]
        for i in urls:
            yield Request(url=i,callback=self.next_parse)

Since the response.follow() method needs to execute the links in start_urls, the situation of using this method is suitable for: the URL of the next page needs to be parsed after the webpage is parsed. In this case, response.follow() is suitable for use, otherwise, an error The application scenario is extremely prone to errors.

Guess you like

Origin blog.csdn.net/qq_17802895/article/details/108545617