Visualize the word cloud with Python! It's super easy, just look at it!

Word cloud is a very beautiful way of visual display. The so-called picture is worth a thousand words. I have used it a lot in previous projects. It may be a good way for me to introduce myself. Just the word cloud, like this:

Personally I think it will be more attractive than the boring textual descriptive introduction.

Today I am not talking about how to use a word cloud as a personal introduction, but a summary of the word cloud calculations used in my work, mainly including three aspects:

1. Simple form such as the above rectangular word cloud

2. Build word cloud data based on background picture data

3. In some scenarios, you don't want to use the default font color similar to the above. Here you can customize the font color of the word cloud

Next, demo and implement the above three types of word cloud visualization methods. The details are as follows. Here we use the test data as follows:

The Zen of Python, by Tim Peters
            Beautiful is better than ugly.
        Explicit is better than implicit.
        Simple is better than complex.
        Complex is better than complicated.
        Flat is better than nested.
        Sparse is better than dense.
        Readability counts.
        Special cases aren't special enough to break the rules.
        Although practicality beats purity.
        Errors should never pass silently.
        Unless explicitly silenced.
        In the face of ambiguity, refuse the temptation to guess.
        There should be one-- and preferably text one --obvious way to do it.
        Although that way may not be obvious at first unless you're Dutch.
        Now is better than never.
        Although never is often better than *right* now.
        If the implementation is hard to explain, it's a bad idea.
        If the implementation is easy to explain, it may be a good idea.
        Namespaces are one honking great idea -- let's do more of those!

1. The simple form rectangular word cloud is implemented as follows:

def simpleWC1(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'):
    '''
    Word Cloud Visualization Demo
    '''
    try:
        with open(freDictpath) as f:
            data=f.readlines()
            data_list=[one.strip().split(sep) for one in data if one]
        fre_dict={}
        for one_list in data_list:
            fre_dict[unicode(one_list[0])]=int(one_list[1])
    except:
        fre_dict = freDictpath
    wc = WordCloud (font_path = 'font / simhei.ttf', # Set font #simhei
                background_color = back, #background color
                max_words = 1300, # The maximum number of words displayed in the word cloud
                max_font_size = 120, #Maximum font
                margin = 3, #Word cloud margin
                width = 1800, #Word cloud image width
                height = 800, #Word cloud image height
                random_state=42)
    wc.generate_from_frequencies (fre_dict) #Generate word cloud from word frequency dictionary
    plt.figure()  
    plt.imshow(wc)
    plt.axis("off")
    wc.to_file(savepath)

The image data results are as follows:

2. The specific implementation of word cloud visualization based on background image data is as follows:

First paste the background image:

This is also a more classic image data, let's look at the specific implementation:

def simpleWC2(sep=' ',back='black',backPic='a.png',freDictpath='data_fre.json',savepath='res.png'):
    '''
    Word Cloud Visualization Demo [Use background image]
    '''
    try:
        with open(freDictpath) as f:
            data=f.readlines()
            data_list=[one.strip().split(sep) for one in data if one]
        fre_dict={}
        for one_list in data_list:
            fre_dict[unicode(one_list[0])]=int(one_list[1])
    except:
        fre_dict = freDictpath
    back_coloring=imread(backPic)
    wc = WordCloud (font_path = 'simhei.ttf', # Set font #simhei
                background_color=back,max_words=1300,
                mask = back_coloring, # Set background image
                max_font_size = 120, #Maximum font
                margin=3,width=1800,height=800,random_state=42,)
    wc.generate_from_frequencies (fre_dict) #Generate word cloud from word frequency dictionary
    wc.to_file(savepath)

The resulting image data is as follows:

3. The specific implementation of the custom word cloud font color is as follows:

#Custom color list
color_list=['#CD853F','#DC143C','#00FF7F','#FF6347','#8B008B','#00FFFF','#0000FF','#8B0000','#FF8C00',
            '#1E90FF','#00FF00','#FFD700','#008080','#008B8B','#8A2BE2','#228B22','#FA8072','#808080']


def simpleWC3(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'):
    '''
    Word Cloud Visualization Demo [custom font color]
    '''
    #Build a colormap object based on a custom color table
    colormap=colors.ListedColormap(color_list)  
    try:
        with open(freDictpath) as f:
            data=f.readlines()
            data_list=[one.strip().split(sep) for one in data if one]
        fre_dict={}
        for one_list in data_list:
            fre_dict[unicode(one_list[0])]=int(one_list[1])
    except:
        fre_dict = freDictpath
    wc = WordCloud (font_path = 'font / simhei.ttf', # Set font #simhei
                background_color = back, #background color
                max_words = 1300, #Maximum number of words displayed in the word cloud
                max_font_size = 120, #Maximum font
                colormap = colormap, #custom build colormap object
                margin=2,width=1800,height=800,random_state=42,
                prefer_horizontal = 0.5) #Cannot be placed horizontally and vertically
    wc.generate_from_frequencies(fre_dict)
    plt.figure()  
    plt.imshow(wc)
    plt.axis("off")
    wc.to_file(savepath)

The resulting image data is as follows:

The above three methods are the three word cloud visualization methods that I use the most frequently in my specific work. The following complete code is posted and can be directly used to run:

#!usr/bin/env python
#encoding:utf-8
from __future__ import division

'''
__Author__: Yishui Hancheng
Function: Visualization module of word cloud
'''

import them
import sys
import json
import numpy as np
from PIL import Image
from scipy.misc import imread
from matplotlib import colors
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
from wordcloud import WordCloud,ImageColorGenerator,STOPWORDS

reload(sys)
sys.setdefaultencoding('utf-8')

#Custom color list
color_list=['#CD853F','#DC143C','#00FF7F','#FF6347','#8B008B','#00FFFF','#0000FF','#8B0000','#FF8C00',
            '#1E90FF','#00FF00','#FFD700','#008080','#008B8B','#8A2BE2','#228B22','#FA8072','#808080']



def simpleWC1(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'):
    '''
    Word Cloud Visualization Demo
    '''
    try:
        with open(freDictpath) as f:
            data=f.readlines()
            data_list=[one.strip().split(sep) for one in data if one]
        fre_dict={}
        for one_list in data_list:
            fre_dict[unicode(one_list[0])]=int(one_list[1])
    except:
        fre_dict = freDictpath
    wc = WordCloud (font_path = 'font / simhei.ttf', # Set font #simhei
                background_color = back, #background color
                max_words = 1300, # The maximum number of words displayed in the word cloud
                max_font_size = 120, #Maximum font
                margin = 3, #Word cloud margin
                width = 1800, #Word cloud image width
                height = 800, #Word cloud image height
                random_state=42)
    wc.generate_from_frequencies (fre_dict) #Generate word cloud from word frequency dictionary
    plt.figure()  
    plt.imshow(wc)
    plt.axis("off")
    wc.to_file(savepath)


def simpleWC2(sep=' ',back='black',backPic='a.png',freDictpath='data_fre.json',savepath='res.png'):
    '''
    Word Cloud Visualization Demo [Use background image]
    '''
    try:
        with open(freDictpath) as f:
            data=f.readlines()
            data_list=[one.strip().split(sep) for one in data if one]
        fre_dict={}
        for one_list in data_list:
            fre_dict[unicode(one_list[0])]=int(one_list[1])
    except:
        fre_dict = freDictpath
    back_coloring=imread(backPic)
    wc = WordCloud (font_path = 'simhei.ttf', # Set font #simhei
                background_color=back,max_words=1300,
                mask = back_coloring, # Set background image
                max_font_size = 120, #Maximum font
                margin=3,width=1800,height=800,random_state=42,)
    wc.generate_from_frequencies (fre_dict) #Generate word cloud from word frequency dictionary
    wc.to_file(savepath)


def simpleWC3(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'):
    '''
    Word Cloud Visualization Demo [custom font color]
    '''
    #Build a colormap object based on a custom color table
    colormap=colors.ListedColormap(color_list)  
    try:
        with open(freDictpath) as f:
            data=f.readlines()
            data_list=[one.strip().split(sep) for one in data if one]
        fre_dict={}
        for one_list in data_list:
            fre_dict[unicode(one_list[0])]=int(one_list[1])
    except:
        fre_dict = freDictpath
    wc = WordCloud (font_path = 'font / simhei.ttf', # Set font #simhei
                background_color = back, #background color
                max_words = 1300, #Maximum number of words displayed in the word cloud
                max_font_size = 120, #Maximum font
                colormap = colormap, #custom build colormap object
                margin=2,width=1800,height=800,random_state=42,
                prefer_horizontal = 0.5) #Cannot be placed horizontally and vertically
    wc.generate_from_frequencies(fre_dict)
    plt.figure()  
    plt.imshow(wc)
    plt.axis("off")
    wc.to_file(savepath)



if __name__ == '__main__':
    text="""
        The Zen of Python, by Tim Peters
        Beautiful is better than ugly.
        Explicit is better than implicit.
        Simple is better than complex.
        Complex is better than complicated.
        Flat is better than nested.
        Sparse is better than dense.
        Readability counts.
        Special cases aren't special enough to break the rules.
        Although practicality beats purity.
        Errors should never pass silently.
        Unless explicitly silenced.
        In the face of ambiguity, refuse the temptation to guess.
        There should be one-- and preferably text one --obvious way to do it.
        Although that way may not be obvious at first unless you're Dutch.
        Now is better than never.
        Although never is often better than *right* now.
        If the implementation is hard to explain, it's a bad idea.
        If the implementation is easy to explain, it may be a good idea.
        Namespaces are one honking great idea -- let's do more of those!
        """
    word_list=text.split()
    fre_dict={}
    for one in word_list:
        if one in fre_dict:
            fre_dict[one]+=1
        else:
            fre_dict[one]=1
    simpleWC1(sep=' ',back='black',freDictpath=fre_dict,savepath='simpleWC1.png')
    simpleWC2(sep=' ',back='black',backPic='backPic/A.png',freDictpath=fre_dict,savepath='simpleWC2.png')
    simpleWC3(sep=' ',back='black',freDictpath=fre_dict,savepath='simpleWC3.png')

Source code acquisition plus group to receive Ha: 850591259

 

Guess you like

Origin www.cnblogs.com/Py1233/p/12672993.html