图片抓取失败

今天发现一个错误日志:

2013-06-06 12:25:13,332 [ERROR]  upload.service.UploadFileService -  image  open error ,url = http://img.xitisi.com/Commodity/BOBOTou_2204/RiXiFaXingNvShengJiaFa_HuaBuWu2011XinKuan_QiLiuHaiBoboBoBoTouXiuLianDuanFaZongSe20120210034904.jpg ,cannot identify image fil

看了一下图片的头信息:

Accept-Ranges bytes
Content-Encoding gzip
Content-Length 452449
Content-Type image/jpeg
Date Thu, 06 Jun 2013 05:03:08 GMT
Etag "8041952b9a50cd1:1a9a"
Last-Modified Fri, 22 Jun 2012 17:12:15 GMT
Server Microsoft-IIS/6.0
Vary Accept-Encoding
X-Powered-By ASP.NET

原来是通过gzip压缩过,所以Image无法识别,需要先处理一下。

解决办法:

1. 通过python的gzip反解

    def _read_content(self,response):
        content_type = response.headers.get('Content-Type')
        content_encoding = response.headers.get("Content-Encoding")
        if response.code == 200 and content_type and content_type.find('image')!=-1:
            data = StringIO(response.read())
            if content_encoding=="gzip":
                data = gzip.GzipFile(fileobj=data).read()
                data = StringIO((data))
            return data
        else:
            logger.error("can't open image ,content type=%s, url=%s"%(content_type,url))
            return None 

 2. 在请求头中指定不支持gzip

    self.headers = {}
            self.headers['User-Agent'] = """Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB6"""
            self.headers['Accept'] = 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
            self.headers['Accept-Encoding'] = 'identity'
            self.headers['Accept-Language'] = "zh,en-us;q=0.7,en;q=0.3"
            self.headers['Accept-Charset'] = "ISO-8859-1,utf-8;q=0.7,*;q=0.7"
            self.headers['Connection'] = "keep-alive"
            self.headers['Keep-Alive'] = "115"
            self.headers['Cache-Control'] = "no-cache"

    def open(self, url):
        try:
            response = self.opener.open(urllib2.Request(url, headers=self.headers),timeout=self.timeout)
            data =  self._read_content(response)
            return data
        except Exception,e:
            logger.error(url)
            logger.exception(e)
            return None    

猜你喜欢

转载自san-yun.iteye.com/blog/1883162