python读入中文文本编码错误
python读入中文txt文本:
#coding:utf-8 def readFile(): fp = open('emotion_dict//neg//neg_all_dict.txt','r') list = [] for line in fp: list.append(line) fp.close() print(list) readFile()
但是有时候会出现错误提示:
UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 10: illegal multibyte sequence
此时,需要对代码做一个小的调整,就可以读入中文,即以中文二进制'rb'读入txt,然后转换为'utf-8',具体代码如下:
#coding:utf-8 def readFile(): fp = open('emotion_dict//neg//neg_all_dict.txt','rb') list = [] for line in fp.readlines(): line = line.strip() line = line.decode('utf-8') list.append(line) fp.close() print(list) readFile()