Word cloud is a very beautiful way of visual display. The so-called picture is worth a thousand words. I have used it a lot in previous projects. It may be a good way for me to introduce myself. Just the word cloud, like this:
Personally I think it will be more attractive than the boring textual descriptive introduction.
Today I am not talking about how to use a word cloud as a personal introduction, but a summary of the word cloud calculations used in my work, mainly including three aspects:
1. Simple form such as the above rectangular word cloud
2. Build word cloud data based on background picture data
3. In some scenarios, you don't want to use the default font color similar to the above. Here you can customize the font color of the word cloud
Next, demo and implement the above three types of word cloud visualization methods. The details are as follows. Here we use the test data as follows:
The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably text one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those!
1. The simple form rectangular word cloud is implemented as follows:
def simpleWC1(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'): ''' Word Cloud Visualization Demo ''' try: with open(freDictpath) as f: data=f.readlines() data_list=[one.strip().split(sep) for one in data if one] fre_dict={} for one_list in data_list: fre_dict[unicode(one_list[0])]=int(one_list[1]) except: fre_dict = freDictpath wc = WordCloud (font_path = 'font / simhei.ttf', # Set font #simhei background_color = back, #background color max_words = 1300, # The maximum number of words displayed in the word cloud max_font_size = 120, #Maximum font margin = 3, #Word cloud margin width = 1800, #Word cloud image width height = 800, #Word cloud image height random_state=42) wc.generate_from_frequencies (fre_dict) #Generate word cloud from word frequency dictionary plt.figure() plt.imshow(wc) plt.axis("off") wc.to_file(savepath)
The image data results are as follows:
2. The specific implementation of word cloud visualization based on background image data is as follows:
First paste the background image:
This is also a more classic image data, let's look at the specific implementation:
def simpleWC2(sep=' ',back='black',backPic='a.png',freDictpath='data_fre.json',savepath='res.png'): ''' Word Cloud Visualization Demo [Use background image] ''' try: with open(freDictpath) as f: data=f.readlines() data_list=[one.strip().split(sep) for one in data if one] fre_dict={} for one_list in data_list: fre_dict[unicode(one_list[0])]=int(one_list[1]) except: fre_dict = freDictpath back_coloring=imread(backPic) wc = WordCloud (font_path = 'simhei.ttf', # Set font #simhei background_color=back,max_words=1300, mask = back_coloring, # Set background image max_font_size = 120, #Maximum font margin=3,width=1800,height=800,random_state=42,) wc.generate_from_frequencies (fre_dict) #Generate word cloud from word frequency dictionary wc.to_file(savepath)
The resulting image data is as follows:
3. The specific implementation of the custom word cloud font color is as follows:
#Custom color list color_list=['#CD853F','#DC143C','#00FF7F','#FF6347','#8B008B','#00FFFF','#0000FF','#8B0000','#FF8C00', '#1E90FF','#00FF00','#FFD700','#008080','#008B8B','#8A2BE2','#228B22','#FA8072','#808080'] def simpleWC3(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'): ''' Word Cloud Visualization Demo [custom font color] ''' #Build a colormap object based on a custom color table colormap=colors.ListedColormap(color_list) try: with open(freDictpath) as f: data=f.readlines() data_list=[one.strip().split(sep) for one in data if one] fre_dict={} for one_list in data_list: fre_dict[unicode(one_list[0])]=int(one_list[1]) except: fre_dict = freDictpath wc = WordCloud (font_path = 'font / simhei.ttf', # Set font #simhei background_color = back, #background color max_words = 1300, #Maximum number of words displayed in the word cloud max_font_size = 120, #Maximum font colormap = colormap, #custom build colormap object margin=2,width=1800,height=800,random_state=42, prefer_horizontal = 0.5) #Cannot be placed horizontally and vertically wc.generate_from_frequencies(fre_dict) plt.figure() plt.imshow(wc) plt.axis("off") wc.to_file(savepath)
The resulting image data is as follows:
The above three methods are the three word cloud visualization methods that I use the most frequently in my specific work. The following complete code is posted and can be directly used to run:
#!usr/bin/env python #encoding:utf-8 from __future__ import division ''' __Author__: Yishui Hancheng Function: Visualization module of word cloud ''' import them import sys import json import numpy as np from PIL import Image from scipy.misc import imread from matplotlib import colors import matplotlib.pyplot as plt from matplotlib.font_manager import FontProperties from wordcloud import WordCloud,ImageColorGenerator,STOPWORDS reload(sys) sys.setdefaultencoding('utf-8') #Custom color list color_list=['#CD853F','#DC143C','#00FF7F','#FF6347','#8B008B','#00FFFF','#0000FF','#8B0000','#FF8C00', '#1E90FF','#00FF00','#FFD700','#008080','#008B8B','#8A2BE2','#228B22','#FA8072','#808080'] def simpleWC1(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'): ''' Word Cloud Visualization Demo ''' try: with open(freDictpath) as f: data=f.readlines() data_list=[one.strip().split(sep) for one in data if one] fre_dict={} for one_list in data_list: fre_dict[unicode(one_list[0])]=int(one_list[1]) except: fre_dict = freDictpath wc = WordCloud (font_path = 'font / simhei.ttf', # Set font #simhei background_color = back, #background color max_words = 1300, # The maximum number of words displayed in the word cloud max_font_size = 120, #Maximum font margin = 3, #Word cloud margin width = 1800, #Word cloud image width height = 800, #Word cloud image height random_state=42) wc.generate_from_frequencies (fre_dict) #Generate word cloud from word frequency dictionary plt.figure() plt.imshow(wc) plt.axis("off") wc.to_file(savepath) def simpleWC2(sep=' ',back='black',backPic='a.png',freDictpath='data_fre.json',savepath='res.png'): ''' Word Cloud Visualization Demo [Use background image] ''' try: with open(freDictpath) as f: data=f.readlines() data_list=[one.strip().split(sep) for one in data if one] fre_dict={} for one_list in data_list: fre_dict[unicode(one_list[0])]=int(one_list[1]) except: fre_dict = freDictpath back_coloring=imread(backPic) wc = WordCloud (font_path = 'simhei.ttf', # Set font #simhei background_color=back,max_words=1300, mask = back_coloring, # Set background image max_font_size = 120, #Maximum font margin=3,width=1800,height=800,random_state=42,) wc.generate_from_frequencies (fre_dict) #Generate word cloud from word frequency dictionary wc.to_file(savepath) def simpleWC3(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'): ''' Word Cloud Visualization Demo [custom font color] ''' #Build a colormap object based on a custom color table colormap=colors.ListedColormap(color_list) try: with open(freDictpath) as f: data=f.readlines() data_list=[one.strip().split(sep) for one in data if one] fre_dict={} for one_list in data_list: fre_dict[unicode(one_list[0])]=int(one_list[1]) except: fre_dict = freDictpath wc = WordCloud (font_path = 'font / simhei.ttf', # Set font #simhei background_color = back, #background color max_words = 1300, #Maximum number of words displayed in the word cloud max_font_size = 120, #Maximum font colormap = colormap, #custom build colormap object margin=2,width=1800,height=800,random_state=42, prefer_horizontal = 0.5) #Cannot be placed horizontally and vertically wc.generate_from_frequencies(fre_dict) plt.figure() plt.imshow(wc) plt.axis("off") wc.to_file(savepath) if __name__ == '__main__': text=""" The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably text one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! """ word_list=text.split() fre_dict={} for one in word_list: if one in fre_dict: fre_dict[one]+=1 else: fre_dict[one]=1 simpleWC1(sep=' ',back='black',freDictpath=fre_dict,savepath='simpleWC1.png') simpleWC2(sep=' ',back='black',backPic='backPic/A.png',freDictpath=fre_dict,savepath='simpleWC2.png') simpleWC3(sep=' ',back='black',freDictpath=fre_dict,savepath='simpleWC3.png')
Source code acquisition plus group to receive Ha: 850591259