Convert UTF-8 with BOM to UTF-8 without BOM in Python

windows对于utf-8编码的文件自带BOM,但是其他系统utf-8编码默认不带BOM。

这就造成在某些情况下字符解码会出现问题,比如python自带的json在读取在window下编码得来的utf-8文件时,会报如下错误: 

ValueError: No JSON object could be decoded

方法一:

f = open("data","r")

s = f.read()

u = s.decode("utf-8-sig") # 得到一个不含BOM的unicode string

s = u.encode("utf-8") # 将unicode转换为utf-8

f.close()

方法二:

import codecs

f = open("data","r")

s = f.read()

if s.startswith(codecs.BOM_UTF8):

      s = s[len(codecs.BOM_UTF8):]

f.close()

原文:https://blog.csdn.net/founderznd/article/details/52197078

参考:https://stackoverflow.com/questions/8898294/convert-utf-8-with-bom-to-utf-8-with-no-bom-in-python

猜你喜欢

转载自blog.csdn.net/zhongbeida_xue/article/details/81206079