Python3 resolve to read Chinese txt file encoding problem

Today small for everyone to share a Python3 solve the problem txt file encoding to read Chinese, have a good reference value, we want to help. Come and see, together follow Xiaobian
description of the problem

Trying to write a Wordcloud with Python, there have been coding problems.
Here Insert Picture Description
Some say the Internet blog shining after Tian Tian to change to change, the result is turned into a "UnicodeDecodeError: 'utf-8' codec can not decode byte ..." error.

Fiddle day ah, TXT (here is my heart holds many lessons for expression). Finally, simply write the simplest to read the file, or even an error. So it is not considered txt coding problem, because the txt file is read on a Mac new plain text file, not a moment to see where to find the code, finally copied to the Windows system, see the txt file encoding, turned out to be ASCII, not my favorite utf-8, Mac you betrayed my trust in you a lot of ah! [epsilon] (┬┬﹏┬┬) 3

Solution

The encoding format of txt file can be changed to utf-8

Further, when opening the file, to add a third parameter encoding = 'utf8' (no bar).

with open('./test3.txt','r',encoding='utf8') as fin:
  for line in fin.readlines():
    line = line.strip('\n')

Attached below source word cloud of the first successful display (refer others online, notes in great detail)

import jieba
import jieba.analyse
from matplotlib import pyplot as plt
from scipy.misc import imread
from wordcloud import WordCloud,STOPWORDS,ImageColorGenerator
  
# 1.读取数据
with open("./test.txt","r",encoding="utf8") as f:
  text = f.read()
  
# 2.基于 TextRank 算法的关键词抽取,top50
keywords = jieba.analyse.textrank(text, topK=50, withWeight=False, allowPOS=('ns', 'n', 'vn', 'v'))
file = ",".join(keywords)
  
# 指定中文字体,不然中文显示框框
font = r'./HYQiHei-25J.ttf'
print(file)
# 指定背景图,随意
image = imread('cake.jpg')
wc = WordCloud(
  font_path=font,
  background_color='white',#背景色
  mask=image,#背景图
  stopwords=STOPWORDS,#设置停用词
  max_words=100,#设置最大文字数
  max_font_size=100,#设置最大字体
  width=800,
  height=1000,
  
)
  
#生成词云
image_colors = ImageColorGenerator(image)
wc.generate(file)
  
# 使用matplotlib,显示词云图
plt.imshow(wc) #显示词云图
plt.axis('off') #关闭坐标轴
plt.show()
# 保存图片
wc.to_file('news.png')

We recommend the python learning sites to see how seniors are learning! From basic python script, reptiles, django, data mining, programming techniques, as well as to combat zero-based sorting data items, given to every love learning python small partner! Python veteran day have to explain the timing of technology, to share some of the ways to learn and need to pay attention to small details, click on Join us python learner gathering
over this Python3 problem reading Chinese txt file is encoded small series for everyone to share the entire contents of the, we hope to give you a reference, I hope you will support script home

Released six original articles · won praise 0 · Views 1311

Guess you like

Origin blog.csdn.net/haoxun06/article/details/104365915