关于爬虫在部署或爬取时编码报错问题的解决

在用scrapy爬虫的时候,有时直接去爬是没有编码报错的,或者直接报编码出错,类似的是:UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xaf in position 235: illegal multibyte sequence
因为当前爬取的网页是gbk格式的,而Python里面是utf-8格式的
这个时候可能会有下面类似的提示:

Traceback (most recent call last):
  File "D:\python\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "D:\python\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\www\Scripts\scrapy.exe\__main__.py", line 9, in <module>
  File "d:\www\lib\site-packages\scrapy\cmdline.py", line 114, in execute
    settings = get_project_settings()
  File "d:\www\lib\site-packages\scrapy\utils\project.py", line 63, in get_project_settings
    init_env(project)
  File "d:\www\lib\site-packages\scrapy\utils\conf.py", line 87, in init_env
    cfg = get_config()
  File "d:\www\lib\site-packages\scrapy\utils\conf.py", line 101, in get_config
    cfg.read(sources)
  File "D:\python\lib\configparser.py", line 696, in read
    self._read(fp, filename)
  File "D:\python\lib\configparser.py", line 1014, in _read
    for lineno, line in enumerate(fp, start=1):
UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 235: illegal multibyte sequence

根据提示:

self._read(fp, filename)	#我判断这里可能有转码方面的问题
        for filename in filenames:
            try:
                with open(filename, encoding=encoding) as fp:    # gbk编码搞事情,源码是encoding=encoding,我改成了"utf-8",完美解决
                    self._read(fp, filename)
            except OSError:
                continue

注意:在你修改源码的时候,Python会有有提示,让选三种选择,直接确定就可以修改源码了

发布了30 篇原创文章 · 获赞 5 · 访问量 3329

猜你喜欢

转载自blog.csdn.net/Python_DJ/article/details/102755073