First, install third-party libraries
jieba wordcloud library and the library is an excellent third-party library, we need to manually install these libraries.
Open cmd, respectively, enter the command, and to install the library wordcloud library jieba
pip install jieba
pip install wordcloud
Second, the article analyzes the use of jieba library
Here I select a writer qian article "path towards the greatest resistance to the", to count the frequency in which the words appear
code show as below:
import jieba txt = open("C:\\text.txt", "r", encoding='utf-8').read() words = jieba.lcut(txt) counts = {} for word in words: if len(word) == 1: continue else: counts[word] = counts.get(word,0) + 1 items = list(counts.items()) items.sort(key=lambda x:x[1], reverse=True) for i in range(15): word, count = items[i] print ("{0:<10}{1:>5}".format(word, count))
Output is as follows:
Visible in this article, before the word frequency words 15, as shown in FIG.
Third, the use of wordcloud library
jieba library can txt the word frequency statistics come out, and we want the output format according to. But these words form the word cloud wordcloud libraries can and can output the picture mode.
Here I still choose the above article to form a word cloud wordcloud library.
code show as below:
#GovRptWordCloudv1.py import jieba import wordcloud f = open("C:\\text.txt", "r", encoding="utf-8") t = f.read() f.close() ls = jieba.lcut(t) txt = " ".join(ls) w = wordcloud.WordCloud( \ width = 1000, height = 700,\ background_color = "white", font_path = "msyh.ttc" ) w.generate(txt) w.to_file("grwordcloud.png")
Word cloud as shown below: