UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0x9c in position 513: invalid start byte

df = pd.read_table(file_name, sep=',', encoding='utf-8', error_bad_lines=False, skiprows=[0,2,3], nrows=20000)

UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x9c in position 513: invalid start byte

df = pd.read_table(file_name, sep=',', encoding='gbk', error_bad_lines=False, skiprows=[0,2,3], nrows=20000)

UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0x9c in position 513: illegal multibyte sequence

df = pd.read_table(file_name, sep=',', encoding='gb18030', error_bad_lines=False, skiprows=[0,2,3], nrows=20000)

UnicodeDecodeError: ‘gb18030’ codec can’t decode byte 0x9c in position 513: illegal multibyte sequence

df = pd.read_table(file_name, sep=',', encoding='ISO-8859-1', error_bad_lines=False, skiprows=[0,2,3], nrows=20000)

pandas.errors.ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

df = pd.read_table(file_name, sep=',', encoding='ISO-8859-1', error_bad_lines=False, skiprows=[0,2,3], nrows=20000,lineterminator="\n")

在这里插入图片描述

df = pd.read_table(file_name, sep=',', encoding='ISO-8859-1', error_bad_lines=False, skiprows=[0,2,3], nrows=20000,lineterminator="\r")

在这里插入图片描述
‘windows-1251’ 1252
a = line.decode(encoding_type)
b = a.encode(‘utf-8’)
b
Out[19]: b’ia\xd0\x9c\xd0\xb0\xd0\x98\xd1\x860\x1e\x14\n’

line.strip().decode(‘gbk’)

‘windows-1253’
line.decode(‘utf-8’, errors=‘ignore’)

‘TIS-620’
line = line.decode(‘utf-8’, errors=‘ignore’)
None
‘ISO-8859-1’ -8 -5 -9

‘ISO-8859-9’ 这种格式用 line = line.decode(‘utf-8’, errors=‘ignore’) 是读不出来的

猜你喜欢

转载自blog.csdn.net/weixin_46713695/article/details/129732050