scrapy-- solve a 302 redirect

 

When multiple requests reptiles, some sites will appear counter-measures reptiles: the request is redirected to a link 404 link tips or verification code to prevent reptiles were links below for the solution:

start_requests DEF (Self): 
        for I in self.start_urls: 
            the yield the Request (I, Meta = { 
                'dont_redirect': True, 
                'handle_httpstatus_list': [302] 
            }, the callback = self.parse) 

# 'dont_redirect': True prohibition redirection 
# request.meta handle_httpstatus_list in each request key can be used to specify the allowed response code.

 

In addition:

According to  the HTTP standard  , the return value success resonse value between 200-300.

If you want to process response outside this range can be produced by a spider  handle_httpstatus_list properties or HTTPERROR_ALLOWED_CODES to specify the spider can handle setting response returned value.

For example, if you want the processing response 404 may return value to do so:

class MySpider(CrawlSpider):
  handle_httpstatus_list = [404]

  

 

Guess you like

Origin www.cnblogs.com/lanston1/p/11120444.html