Word cloud is an important way of text visualization, which can highlight key sentences and vocabulary in large sections of text.
This article first introduces several Python libraries for making word clouds, namely WordCloud, StyleCloud, and Pyecharts; plus an online word cloud production website; finally, a simple comparison between them through code practice and visualization effects
WordCloud, StyleCloud, Pyecharts these three packages all have one characteristic: only a few lines of code can draw a beautiful word cloud map, but the amount of parameters to be set is large ;
WordCloud
WordCloud is the most frequently used library in Python to make word cloud images. It is easy to get started and easy to operate; the shape of the word cloud mask can be customized; the two libraries introduced later are based on it for secondary development
WordCloud encapsulates all methods in the WordCloud class, and only need to change some parameters when using it to adjust the style of the word cloud diagram
With a simple circular word cloud, for example,
First use collections to build a word frequency dictionary, and then use the generate_from_frequencies() method in WordCloud() to fit the incoming text
Regarding the shape of the word cloud, the following code uses numpy to generate a circular binary array as the mask parameter;
from wordcloud import WordCloud
from collections import Counter
word_list = []
with open("danmu.txt",encoding='utf-8') as f:
words = f.read()
for word in words.split('\n'):
if re.findall('[\u4e00-\u9fa5]+', str(word), re.S): # 正则表达式匹配中文字符
word_list.append(word)
def SquareWord(word_list):
counter = Counter(word_list) # 计算词频;
start = random.randint(0, 15) # 随机取0-15中间一个数字;
result_dict = dict(counter.most_common()[start:]) # 在 counter 中取前start 个元素;
x,y = np.ogrid[:300,:300] # 创建0-300二维数组;
mask = (x-150)**2 + (y-150)**2>130**2 #创建以 150,150为圆心,半径为130的Mask;
mask = 255*mask.astype(int) # 转化为 int
wc = WordCloud(background_color='black',
mask = mask,
mode = 'RGB',
font_path="D:/Data/fonts/HGXK_CNKI.ttf", # 设置字体路径,用于设置中文,
).generate_from_frequencies(result_dict)
plt.axis("off")
plt.imshow(wc,interpolation="bilinear")
plt.show()
SquareWord(word_list)# 绘制词云图主函数
The effect is as follows:
The most prominent point of WordCloud compared to the other two Python libraries: **You can customize the Mask**, and pass in a numpy array through the mask parameter to set the shape of the word cloud
Note, however, that only the text filled value!=255
area of Value ==255
the region is ignored , so that if this condition is not satisfied as an alternative of the mask image, then, as the need for preprocessing the image, the background screen is filled with white pixels
Custom mask word cloud drawing
def AliceWord(word_list):
counter = Counter(word_list) # 计算词频;
start = random.randint(0, 15) # 随机取0-15中间一个数字;
result_dict = dict(counter.most_common()[start:]) # 在 counter 中取前start 个元素;
# x, y = np.ogrid[:300, :300] # 创建0-300二维数组;
# mask = (x - 150) ** 2 + (y - 150) ** 2 > 130 ** 2 # 创建以 150,150为圆心,半径为130的Mask;
# mask = 255 * mask.astype(int) # 转化为 int
# 读取图片作为 Mask
alic_coloring = np.array(Image.open("D:/Data/WordArt/Alice_mask.png"))
wc = WordCloud(background_color = "white",# 设置背景颜色
mode ="RGB",
mask=alic_coloring,# 为None时,自动创建一个二值化图像,长400,宽200;
min_font_size=4,# 使用词的最小频率限定;
relative_scaling= 0.8,# 词频与大小相关性
font_path="D:/Data/fonts/HGXK_CNKI.ttf", # 字体路径,用于设置中文,
).generate_from_frequencies(result_dict)
wc.to_file("D:/Data/WordArt/wordclound.jpg")# 把生成的词云图进行保存
plt.axis("off")
plt.imshow(wc, interpolation="bilinear")
plt.show()
Visualization
Finally, here are some of the most important parameter settings in WordCloud:
- background_color(type->str), color name or color code, set the background color of the word cloud
- font_path(type->str), customize the font path. If you need to pay attention to the preview of Chinese text, this parameter must be set, otherwise garbled characters will occur;
- mask(type->ndarray), customize the shape of the word cloud, ignore the pure white area when drawing;
- mode(type->str), when set to'RGBA', the background is transparent, and the default is'RGB';
- relative_scaling (type-> float), the vocabulary frequency is related to the final vocabulary display size, the value is 0 -1; the larger the value, the stronger the correlation, the default is 0.5;
- prefer_horizontal(type->float), controls the proportion of horizontal text relative to the displayed text at disposal. The smaller the ratio, the more vertical text will be displayed in the word cloud diagram;
In addition to the above parameters, you can also set 颜色,禁用词,是否出现重复词
other information
For details, please refer to the official document
https://amueller.github.io/word_cloud/generated/wordcloud.WordCloud.html#wordcloud.WordCloud
StyleCloud
StyleCloud is developed based on WordCloud, and some new features have been added to WordCloud.
- 1. Support color gradient;
- 2. Regarding the word cloud color, it can be set through the designed color palette;
- 3. Support icons as masks. This new feature is the best, and it can be directly connected to the Font Awesome website during setting, which has a variety of icons
- 4. In addition to text text that can be used as vocabulary input, it also supports input in csv and txt file formats;
The main program only needs one line of code
def Style_WordArt():
# StyleClound 绘制词云图
stylecloud.gen_stylecloud(
file_path = "danmu.txt",#词云文本
background_color='white',#背景颜色
palette="colorbrewer.qualitative.Dark2_7",#调色板,来改变词云图文本颜色
icon_name='fas fa-cat',# 词云图标;
font_path= "D:/Data/fonts/HGXK_CNKI.ttf",# 中文字体路径
random_state=40,#控制文本颜色随机状态;
invert_mask= False,# 最终Mask是否逆置;
output_name="D:/Data/WordArt/styleclound.jpg",# 图片保存路径
)
The effect is as follows:
Modifying a mask, then only need to change the icon_name
parameters to, refer to Font Awesome site, https://fontawesome.com/icons?d=gallery&m=free , thousands of patterns which can be used
icon_name
The name can be set to the class tag of the target icon, as follows
When icon_name = 'fas fa-dog'
the time
When icon_name ='fab fa-amazon'
the time:
Regarding the word cloud color palette setting, just modify the palette parameter. For the palette setting, please refer to the Palettable website: https://jiffyclub.github.io/palettable/ , there are a variety of palette style templates to choose from
Among them, there are many sub-modules in each of the above modules, which are the palettes that need to be set eventually
Select any template when you set up sub-template, you do not need the front of the palettable.
string; for example I want to set palettale.colorbrewser.qualitative.Dark2_3 as a palette version, simply palettle = 'colorbrewser.qualitative.Dark2_3'
can
Set different color palettes, and there will be different style effects in the end!
paletabble ='colorbrewer.qualitative.Paired_10'
paletabble ='lightbartlein.diverging.BlueDarkOrange12_11'
For the usage of other parameters of Stylecloud, please refer to the official document https://github.com/minimaxir/stylecloud
Pyecharts
Pyecharts is developed based on Apache Echarts and is mainly used for data visualization; the word cloud diagram is only one of many chart types. Compared with the first two word cloud packages, the visualization effect of Pyecharts is weaker.
But Pyecharts saves the word cloud image as a single html file, and it finally shows a certain interactive effect
Code part
from pyecharts.charts import WordCloud
import pyecharts.options as opts
word_list = []
with open("danmu.txt",encoding='utf-8') as f:
words = f.read()
for word in words.split('\n'):
if re.findall('[\u4e00-\u9fa5]+', str(word), re.S): # 正则表达式匹配中文字符
word_list.append(word)
def Pyecharts_wordArt(word_list):
counter = Counter(word_list) # 计算词频;
start = random.randint(0, 15) # 随机取0-15中间一个数字;
result_dict = list(counter.most_common()[start:]) # 在 counter 中取前start 个元素;
print(result_dict[5:])
Charts = WordCloud().add(series_name="Pyecharts", data_pair=result_dict, word_size_range=[6, 66]).set_global_opts(
title_opts=opts.TitleOpts(
title="Pyecharts", title_textstyle_opts=opts.TextStyleOpts(font_size=23)),
tooltip_opts=opts.TooltipOpts(is_show=True),
)
Charts.render("Pyecharts_Wordclound.html")
Pyecharts_wordArt(word_list)
It should be noted that the text entered by Pyecharts needs to be a list type, and every word and its frequency of occurrence are in the form of an array, the format is as follows:
to sum up
On the basis of these three word cloud images, here is another word cloud online production site, WordArt.com. The final visualization effect is better than the above three, and the adjustment style is also very convenient, simple and intuitive. If the number of word cloud images is produced If there are not many, it is recommended to draw on this website
To compare these tools, I will sort them from the following perspectives
Visualization
WordArt > Stylecloud > WordCloud > Pyecharts
Interactive effect
WordArt > Pyecharts > StyleCloud = WordCloud
Automation efficiency
Pyecharts = StyleCloud = WordCloud > WordArt
Ease of use
WrodArt > StyleCloud > Pyecharts > WordCloud
As for the final choice as the final word cloud drawing tool, you need to choose according to your own situation and usage scenarios, but no matter which tool, you must briefly understand in advance
Okay, the above is all the content of this article. Finally, thank you all for reading. See you in the next issue!