UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 0: invalid start byte

最近在搞深度学习时用到TF-IDF词向量空间的东西,在python3.6.5下运行代码:

    vectorizer = TfidfVectorizer(
        stop_words=stpwrdlst, sublinear_tf=True, max_df=0.5)

报错:UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 0: invalid start byte

解决方法:忽略error,将代码改为:

    vectorizer = TfidfVectorizer(
        stop_words=stpwrdlst, sublinear_tf=True, max_df=0.5, decode_error='ignore')

即添加:decode_error=''ignore'。

猜你喜欢

转载自blog.csdn.net/Lee0_King/article/details/81436216