快速上手python词云

本项目旨在实现词云的可视化,适用英文、中文、中文去除停留词(采用哈工大停留词表)和定制形状情况下的词云生成。

工具:python3.7 + Jupyter

1. 英文词云

效果图:

                                                                  

代码实现:

import matplotlib.pyplot as plt
from wordcloud import WordCloud

mytext = open('text\en-demo.txt',encoding='utf-8').read()
wordcloud = WordCloud().generate(mytext)

plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")

2. 中文词云,未去除停留词

效果图:

                                                                

代码实现:

import matplotlib.pyplot as plt  
from wordcloud import WordCloud
import jieba   #中文分词

mytext = open('text\ch-demo.txt',encoding='utf-8').read()
mytext = " ".join(jieba.cut(mytext))
wordcloud = WordCloud(font_path="text\simsun.ttf").generate(mytext)

plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")

3. 中文词云,已去除停留词

效果图:

                                                             

代码实现:

from wordcloud import WordCloud
import jieba

#读取文本
mytext = open('text\ch-demo.txt',encoding='utf-8').read()

#未去停用词的分词
mytext = " ".join(jieba.cut(mytext))

#停留词ch_stopwords.txt采用哈工大停留词表
w = WordCloud(width=500,
              height=400,
              background_color='black',
              font_path='msyh.ttc',
             stopwords=[line.strip() for line in open('text\ch_stopwords.txt', encoding='UTF-8').readlines()]).generate(mytext)

w.to_file('output\ch_output.png')

4. 任意形状词云

效果图:

                                                         

代码实现:

from wordcloud import WordCloud
import matplotlib.pyplot as plt
import numpy as np
import jieba
import imageio
import re
import PIL

image1= PIL.Image.open('text\horse.png')
mk = np.array(image1)

#读取文本
mytext = open('text\ch-demo.txt',encoding='utf-8').read()

#去除标点符号、换行符
punctuation = ',。?:、'
def removePunctuation(text):
    text = re.sub(r'[{}]+'.format(punctuation),'',text)
    return text.strip().lower()

#未去停用词的分词
mytext = " ".join(jieba.cut(mytext))
mytext = removePunctuation(mytext)
mytext = mytext.replace('\n', '')

w = WordCloud(width=500,
              height=400,
              background_color='black',
              font_path='msyh.ttc',
             stopwords=[line.strip() for line in open('text\ch_stopwords.txt', encoding='UTF-8').readlines()],
             mask=mk).generate(mytext)

w.to_file('output\shape_output.png')

猜你喜欢

转载自blog.csdn.net/huaf_liu/article/details/112204024
今日推荐