Word Cloud - Python

What's Word Cloud

Word cloud (Word Cloud) is a higher frequency of words in the text appears to give a visual display of graphics, is a common method of text mining. There are a variety of data analysis tools that support graphics, such as Matlab, SPSS, SAS, R and Python, etc., there are many online web page can generate word cloud, for example wordclouds.com

Word Cloud Example

Create Word Cloud via Python

Python can be used wordcloud module to generate a word cloud.

1) installation wordcloud, matplotlib module and its dependencies.

2) prepare a text.

I found a paragraph on from Wikipedia Word Cloud History text, this text will be the following example. Copy this text into NotePad, and save it as a. * Txt text format.

3) Run Python script.

"""
Python Example
===============
Generating a wordcloud from the txt file using Python.
"""

from wordcloud import WordCloud

# Read the whole text from txt.
fp = "C:/Users/yuki/Desktop/WordCloudHistory.txt"
text = open(fp).read()

# Generate a word cloud image
wordcloud = WordCloud(
font_path = "C:/Windows/Fonts/BROADW.TTF", 
width = 600, #width of the canvas.
height = 400, #height of the canvas.
max_font_size = 60,
font_step = 1,
background_color = "white",
random_state = 1,
margin = 2,
colormap = "tab20" #matplotlib colormap
).generate(text)

# Display the generated image in matplotlib way:
import matplotlib.pyplot as plt
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

4) generating a word cloud.
Word Cloud Python

Notes

When using wordcloud modules have found the frequency of certain words (or weight) is the same, but the graphics generated font size is not the same.

Find the answer after the development of Google:

wordcloud document


The algorithm might give more weight to the ranking of the words than their actual frequencies, depending on the max_font_size and the scaling heuristic.

github issues


The scaling is relative to the size of the figure and the frequency of the words. The frequencies are normalized against the max frequency, so the absolute values are irrelevant.

Presumably to fill the canvas to the words as much as possible, wordcloud algorithm automatically adjusted automatically according to the words max_font_size and scale weights. So wordcloud generated graphics words his word frequency and size (or weight) of the absolute value is not one to one relationship.

I think the thing: Although this Draw graphics looks better, but still feel a bit strange, after all, by word frequency display size should be word cloud words the essence of such a pattern.

Sample Code

download here

Guess you like

Origin www.cnblogs.com/yukiwu/p/10967037.html