Word cloud drawing, recommend three Python packages plus an online website!

Word cloud is an important way of text visualization, which can highlight key sentences and vocabulary in large sections of text.

This article first introduces several Python libraries for making word clouds, namely WordCloud, StyleCloud, and Pyecharts; plus an online word cloud production website; finally, a simple comparison between them through code practice and visualization effects

WordCloud, StyleCloud, Pyecharts these three packages all have one characteristic: only a few lines of code can draw a beautiful word cloud map, but the amount of parameters to be set is large ;

WordCloud

WordCloud is the most frequently used library in Python to make word cloud images. It is easy to get started and easy to operate; the shape of the word cloud mask can be customized; the two libraries introduced later are based on it for secondary development

WordCloud encapsulates all methods in the WordCloud class, and only need to change some parameters when using it to adjust the style of the word cloud diagram

With a simple circular word cloud, for example,

First use collections to build a word frequency dictionary, and then use the generate_from_frequencies() method in WordCloud() to fit the incoming text

Regarding the shape of the word cloud, the following code uses numpy to generate a circular binary array as the mask parameter;

from wordcloud import WordCloud
from collections import Counter


word_list = []
with open("danmu.txt",encoding='utf-8') as f:
    words = f.read()
    for word in words.split('\n'):
        if re.findall('[\u4e00-\u9fa5]+', str(word), re.S):  # 正则表达式匹配中文字符
            word_list.append(word)


def SquareWord(word_list):
    counter = Counter(word_list) # 计算词频;
    start = random.randint(0, 15) # 随机取0-15中间一个数字;
    result_dict = dict(counter.most_common()[start:]) # 在 counter 中取前start 个元素;

    x,y = np.ogrid[:300,:300] # 创建0-300二维数组;
    mask = (x-150)**2 + (y-150)**2>130**2 #创建以 150,150为圆心,半径为130的Mask;
    mask = 255*mask.astype(int) # 转化为 int

    wc = WordCloud(background_color='black',
                   mask = mask,
                   mode = 'RGB',
                   font_path="D:/Data/fonts/HGXK_CNKI.ttf",  # 设置字体路径,用于设置中文,
                   ).generate_from_frequencies(result_dict)

    plt.axis("off")
    plt.imshow(wc,interpolation="bilinear")
    plt.show()
    
    
SquareWord(word_list)# 绘制词云图主函数

The effect is as follows:

square

The most prominent point of WordCloud compared to the other two Python libraries: **You can customize the Mask**, and pass in a numpy array through the mask parameter to set the shape of the word cloud

Note, however, that only the text filled value!=255area of Value ==255the region is ignored , so that if this condition is not satisfied as an alternative of the mask image, then, as the need for preprocessing the image, the background screen is filled with white pixels

image-20210210103201422

Custom mask word cloud drawing

def AliceWord(word_list):
    counter = Counter(word_list)  # 计算词频;
    start = random.randint(0, 15)  # 随机取0-15中间一个数字;
    result_dict = dict(counter.most_common()[start:])  # 在 counter 中取前start 个元素;

    # x, y = np.ogrid[:300, :300]  # 创建0-300二维数组;
    # mask = (x - 150) ** 2 + (y - 150) ** 2 > 130 ** 2  # 创建以 150,150为圆心,半径为130的Mask;
    # mask = 255 * mask.astype(int)  # 转化为 int

    # 读取图片作为 Mask
    alic_coloring = np.array(Image.open("D:/Data/WordArt/Alice_mask.png"))
    wc = WordCloud(background_color = "white",# 设置背景颜色
                   mode ="RGB",
                   mask=alic_coloring,# 为None时,自动创建一个二值化图像,长400,宽200;
                   min_font_size=4,#  使用词的最小频率限定;
                   relative_scaling= 0.8,# 词频与大小相关性
                   font_path="D:/Data/fonts/HGXK_CNKI.ttf",  # 字体路径,用于设置中文,
                   ).generate_from_frequencies(result_dict)

    wc.to_file("D:/Data/WordArt/wordclound.jpg")# 把生成的词云图进行保存
    plt.axis("off")
    plt.imshow(wc, interpolation="bilinear")
    plt.show()

Visualization

wordclound

Finally, here are some of the most important parameter settings in WordCloud:

  • background_color(type->str), color name or color code, set the background color of the word cloud
  • font_path(type->str), customize the font path. If you need to pay attention to the preview of Chinese text, this parameter must be set, otherwise garbled characters will occur;
  • mask(type->ndarray), customize the shape of the word cloud, ignore the pure white area when drawing;
  • mode(type->str), when set to'RGBA', the background is transparent, and the default is'RGB';
  • relative_scaling (type-> float), the vocabulary frequency is related to the final vocabulary display size, the value is 0 -1; the larger the value, the stronger the correlation, the default is 0.5;
  • prefer_horizontal(type->float), controls the proportion of horizontal text relative to the displayed text at disposal. The smaller the ratio, the more vertical text will be displayed in the word cloud diagram;

In addition to the above parameters, you can also set 颜色,禁用词,是否出现重复词other information

image-20210210110804316

For details, please refer to the official document

https://amueller.github.io/word_cloud/generated/wordcloud.WordCloud.html#wordcloud.WordCloud

StyleCloud

StyleCloud is developed based on WordCloud, and some new features have been added to WordCloud.

image-20210210114111274

  • 1. Support color gradient;
  • 2. Regarding the word cloud color, it can be set through the designed color palette;
  • 3. Support icons as masks. This new feature is the best, and it can be directly connected to the Font Awesome website during setting, which has a variety of icons
  • 4. In addition to text text that can be used as vocabulary input, it also supports input in csv and txt file formats;

The main program only needs one line of code

def Style_WordArt():
    # StyleClound 绘制词云图
    stylecloud.gen_stylecloud(
        file_path = "danmu.txt",#词云文本
        background_color='white',#背景颜色
        palette="colorbrewer.qualitative.Dark2_7",#调色板,来改变词云图文本颜色
        icon_name='fas fa-cat',# 词云图标;
        font_path= "D:/Data/fonts/HGXK_CNKI.ttf",# 中文字体路径
        random_state=40,#控制文本颜色随机状态;
        invert_mask= False,# 最终Mask是否逆置;
        output_name="D:/Data/WordArt/styleclound.jpg",# 图片保存路径
    )

The effect is as follows:

styleclound

Modifying a mask, then only need to change the icon_nameparameters to, refer to Font Awesome site, https://fontawesome.com/icons?d=gallery&m=free , thousands of patterns which can be used

image-20210210115437929

icon_nameThe name can be set to the class tag of the target icon, as follows

image-20210210120030256

When icon_name = 'fas fa-dog'the time

styleclound1

When icon_name ='fab fa-amazon'the time:

styleclound11

Regarding the word cloud color palette setting, just modify the palette parameter. For the palette setting, please refer to the Palettable website: https://jiffyclub.github.io/palettable/ , there are a variety of palette style templates to choose from

image-20210210120541712

Among them, there are many sub-modules in each of the above modules, which are the palettes that need to be set eventually

image-20210210121310786

Select any template when you set up sub-template, you do not need the front of the palettable.string; for example I want to set palettale.colorbrewser.qualitative.Dark2_3 as a palette version, simply palettle = 'colorbrewser.qualitative.Dark2_3'can

Set different color palettes, and there will be different style effects in the end!

paletabble ='colorbrewer.qualitative.Paired_10'

image-20210210121514012

paletabble ='lightbartlein.diverging.BlueDarkOrange12_11'

image-20210210121739080

For the usage of other parameters of Stylecloud, please refer to the official document https://github.com/minimaxir/stylecloud

Pyecharts

Pyecharts is developed based on Apache Echarts and is mainly used for data visualization; the word cloud diagram is only one of many chart types. Compared with the first two word cloud packages, the visualization effect of Pyecharts is weaker.

But Pyecharts saves the word cloud image as a single html file, and it finally shows a certain interactive effect

word

Code part

from pyecharts.charts import WordCloud
import pyecharts.options as opts


word_list = []
with open("danmu.txt",encoding='utf-8') as f:
    words = f.read()
    for word in words.split('\n'):
        if re.findall('[\u4e00-\u9fa5]+', str(word), re.S):  # 正则表达式匹配中文字符
            word_list.append(word)
            
def Pyecharts_wordArt(word_list):
    counter = Counter(word_list)  # 计算词频;
    start = random.randint(0, 15)  # 随机取0-15中间一个数字;
    result_dict = list(counter.most_common()[start:])  # 在 counter 中取前start 个元素;
    print(result_dict[5:])

    Charts = WordCloud().add(series_name="Pyecharts", data_pair=result_dict, word_size_range=[6, 66]).set_global_opts(
        title_opts=opts.TitleOpts(
            title="Pyecharts", title_textstyle_opts=opts.TextStyleOpts(font_size=23)),
        tooltip_opts=opts.TooltipOpts(is_show=True),
    )
    Charts.render("Pyecharts_Wordclound.html")


Pyecharts_wordArt(word_list)

It should be noted that the text entered by Pyecharts needs to be a list type, and every word and its frequency of occurrence are in the form of an array, the format is as follows:

image-20210210130357486

to sum up

On the basis of these three word cloud images, here is another word cloud online production site, WordArt.com. The final visualization effect is better than the above three, and the adjustment style is also very convenient, simple and intuitive. If the number of word cloud images is produced If there are not many, it is recommended to draw on this website

image-20210210131002843

To compare these tools, I will sort them from the following perspectives

Visualization

WordArt > Stylecloud > WordCloud > Pyecharts

Interactive effect

WordArt > Pyecharts > StyleCloud = WordCloud

Automation efficiency

Pyecharts = StyleCloud = WordCloud > WordArt

Ease of use

WrodArt > StyleCloud > Pyecharts > WordCloud

As for the final choice as the final word cloud drawing tool, you need to choose according to your own situation and usage scenarios, but no matter which tool, you must briefly understand in advance

Okay, the above is all the content of this article. Finally, thank you all for reading. See you in the next issue!

Guess you like

Origin blog.csdn.net/weixin_42512684/article/details/113806725