python读入中文文本编码错误


python读入中文文本编码错误


python读入中文txt文本:

#coding:utf-8

def readFile():
    fp = open('emotion_dict//neg//neg_all_dict.txt','r')
    list = []
    for line in fp:
        list.append(line)
    fp.close()
    print(list)
readFile()


但是有时候会出现错误提示:

UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 10: illegal multibyte sequence

此时,需要对代码做一个小的调整,就可以读入中文,即以中文二进制'rb'读入txt,然后转换为'utf-8',具体代码如下:

#coding:utf-8

def readFile():
    fp = open('emotion_dict//neg//neg_all_dict.txt','rb')
    list = []
    for line in fp.readlines():
        line = line.strip()
        line = line.decode('utf-8')
        list.append(line)
    fp.close()
    print(list)
readFile()
 
 
 
 
 
 
 
 
 
 
 
 
 

猜你喜欢

转载自blog.csdn.net/qiang12qiang12/article/details/53493334