When multiple requests reptiles, some sites will appear counter-measures reptiles: the request is redirected to a link 404 link tips or verification code to prevent reptiles were links below for the solution:
start_requests DEF (Self): for I in self.start_urls: the yield the Request (I, Meta = { 'dont_redirect': True, 'handle_httpstatus_list': [302] }, the callback = self.parse) # 'dont_redirect': True prohibition redirection # request.meta handle_httpstatus_list in each request key can be used to specify the allowed response code.
In addition:
According to the HTTP standard , the return value success resonse value between 200-300.
If you want to process response outside this range can be produced by a spider handle_httpstatus_list
properties or HTTPERROR_ALLOWED_CODES
to specify the spider can handle setting response returned value.
For example, if you want the processing response 404 may return value to do so:
class MySpider(CrawlSpider):
handle_httpstatus_list = [404]