问题解决:raise ValueError('Missing scheme in request url: %s' % self._url) ValueError: Missing schem

When using scrapy crawling pictures today and found this error:

raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: //images2015.cnblogs.com/news_topic/20161020185521154-1185360701.png

Obviously, scrapy in the pipeline in the process of downloading the picture, does not recognize the target url.
web to //target the link at the beginning of "relative agreement", which is in front eliminating the http: or https: words, the benefits of doing so is a file browser can be automatically loaded on CDN hosting your site according to the protocol used .

Crawling linked files during expression should be changed to the correct protocol relative url by positive goals

before fixing:

front_image = response.meta.get("front_image_url", "")
article_item["front_image_url"] = [front_image]  # pipeline下载图片一定要传list

Modified:

import re

front_image = response.meta.get("front_image_url", "")
if re.match("^(//).*", front_image):
	front_image = "https:" + front_image
article_item["front_image_url"] = [front_image]  # pipeline下载图片一定要传list
Published 673 original articles · won praise 644 · views 380 000 +

Guess you like

Origin blog.csdn.net/zhaohaibo_/article/details/104460090