Python draws basic word cloud graph

Python word cloud production.

Word cloud introduction:

Word cloud is a way of visually presenting text,

The more times the word cloud appears, the larger the font and the more eye-catching the color,

The main content of the text can be quickly obtained through the word cloud

Let me show you the effect of word cloud drawing:

 

 

A third-party library used to draw word clouds: wordcloud, which needs to be downloaded before importing.

Before drawing, you need to create a word cloud object, the format is as follows:

variable name = wordcloud.WordCloud()

The following parameters can be filled in the brackets:

parameter effect
height The width of the image, the default is 400 pixels
width The height of the image, the default is 200 pixels
font_path

Font settings, with the suffix ".ttf",

For example: "simhei.ttf"

stopwords Stop words, which will be introduced later
background_color

word cloud background color,

Default: black

colormap

Word Cloud Color Settings

Later, "variable name.generate()" is needed to load the word cloud text, and fill in the content of the word cloud to be made in the brackets, for

string.

For the setting of colormap and background_color parameters, please see the legend below:

 

Finally, add "variable name.to_file()" to save the generated word cloud image, fill in the image name in brackets,

The suffix can be ".png", ".jpeg".

Here is an introduction to how the word cloud works:

1. Process the text and segment the text into word clouds one by one.

2. Count the number of occurrences of each word in the text.

3. Configure the font size and color according to the count of each word.

4. Save the picture.

 

Let me show you a case:

 

import wordcloud

text = 'Word Cloud, or Tag Cloud is a visual representation of text data. Word Cloud could
 display a list of words. The size and color of each word in a Word Cloud indicates its 
frequency or importance in the text. In another word, significant textual data points can
 be highlighted using a Word Cloud. It is easy to generate a Word Cloud with Python. You 
simply need a library called wordcloud. After this class, you will be able to generate a 
beautiful Word Cloud.'


print('开始绘制词云...')

w = wordcloud.WordCloud(
    width=1000,
    height=600
)
w.generate(text)
w.to_file('词云1.png')

print('词云图片已生成')

Take a look at the effect:

 

Introduction to stop words:

When processing text, the word cloud will automatically filter out words that are not the main information of the text (only English words),

Therefore, the reason why a word that appears many times does not appear in the word cloud is that the word is not part of the text

main information.

How to get stop words in wordcloud:

print(wordcloud.STOPWORDS)

 The result is:

​
{'how', 'however', "what's", 'could', 'in', 'itself', 'hers', 'its', "where's", 'has',
 'if', "she's", 'once', 'she', 'having', "i'd", 'same', 'by', 'up', "can't", 'http', 
"there's", "haven't", 'and', "shouldn't", 'you', 'while', "he'll", 'out', 'what', "that's",
 'both', 'myself', "she'd", "they'll", 'cannot', "weren't", 'it', "they'd", "we're", 
'ours', 'as', "isn't", 'herself', 'then', 'whom', "when's", 'through', 'yourself', 'also',
 'yourselves', "we've", 'get', "hasn't", 'on', 'should', "here's", 'we', 'were', 'ought', 
'between', 'again', 'her', 'i', 'these', 'otherwise', 'his', 'after', 'than', 'are', 'the',
 'your', 'does', 'where', 'just', 'doing', 'being', 'too', 'do', 'com', 'few', 'that', 
'against', 'because', 'k', 'r', 'is', 'himself', 'under', 'here', 'any', 'themselves', 
'which', 'be', "hadn't", 'an', 'only', 'nor', 'ourselves', 'him', "he's", 'those', 'very', 
"wouldn't", 'other', "mustn't", 'their', 'with', 'into', "aren't", "i'm", 'of', "they've", 
'them', 'but', 'this', "we'll", "they're", 'would', 'ever', 'from', 'yours', "how's", 
'else', "you'll", 'to', "she'll", 'until', 'had', 'he', 'most', 'further', "you've", 
'like', "don't", 'more', 'some', 'there', "you're", "he'd", 'such', 'been', 'am', 'during', 
'about', 'all', 'why', 'when', 'shall', 'down', "it's", "let's", 'no', 'was', "why's", 
'can', 'who', 'each', 'at', 'before', "couldn't", 'have', "shan't", "who's", 'since', 
'theirs', 'below', 'they', "we'd", "you'd", "i'll", 'a', "doesn't", 'our', "wasn't", 'me', 
"didn't", "won't", 'my', 'hence', 'not', 'over', 'off', 'or', 'above', 'for', 'own', 'www',
 'did', 'therefore', 'so', "i've"}


​

As you can see, all English stop words have been displayed.

Therefore, we can use a stop word variable and add all stop words to it,

Fill in the variable name in the stop words parameter in WordCloud.


Draw a Chinese word cloud:

Drawing Chinese word cloud can be realized by using the third-party Chinese word segmentation library "jieba". At the beginning, you need to import:

import jieba

Consider the following cases:

​
import wordcloud
import jieba

text = '故今日之责任,不在他人,而全在我少年。少年智则国智,少年富则国富;少年强则国强,少年独立则国独立;少年自由则国自由;少年进步则国进步;少年胜于欧洲则国胜于欧洲;少年雄于地球,则国雄于地球。'

# 使用jieba库的lcut()命令进行分词
lst = jieba.lcut(text)
# 使用join()命令将列表 lst 用空格拼接成字符串
m = ' '.join(lst)

win = 'simhei.ttf'
w = wordcloud.WordCloud(
    width=1000,
    height=700,
    # 根据自己的电脑系统选择中文字体
    font_path= win,
    background_color='black',
    colormap='hsv'
)
# 加载由词语组成的文本
w.generate(m)
w.to_file('词云2.png')

print('词云图片已生成')

Effect:

 Look at another case:

import wordcloud
import jieba

text = '''好文章摘抄200字(一)
《奋斗的意义》
人的心理常常容易受到伤害的原因之一,就是要求事事都合理公平。所以才会有不少人产生"社会上都凭关系背
景,我奋斗又有什么用"的观点。其实,把事事都公平作为人类的理想而为之奋斗是应当的,但若把公平当成现
实的,则很幼稚。因为在现实世界里,不存在绝对的公平。不少年轻人遇到不公平的事,往往爱发牢骚、抱怨,
甚至有的人还将"不公平"作为自己消极无为、逃避现实的托词而不努力,结果丧失了许多转变命运的机会。
好文章摘抄200字(二)
《生命。健康》
最好的医生是自己,最好的的药物是时间,最好的的心情是宁静,最好的的保健是笑容,最好的运动是步行。
欢乐是长寿的妙药,勤奋是健康的灵丹,运动是健康的投资,长寿是健康的回报,相逢莫问留春术,谈泊宁静比
药好。
金钱难买健康,健康大于金钱,金钱难买幸福,幸福必有健康,生命的幸福不在名利在健康,身体的强壮不在金
钱在运动。
好文章摘抄200字(三)
《什么是心理健康》
健康是人类生存极为重要的内容,它对于人类的发展,社会的变革,文化的更新,生活方式的改变,有着决定性
的作用。那么,一个人怎样才算健康呢? 1948年世界卫生组织明确规定:健康不仅是身体没有疾病,而且应当
重视心理健康,只有身心健康、体魄健全,才是完整的健康。可见心理健康是人的健康不可分割的重要部分。
好文章摘抄200字(四)
《睡眠与记忆》
"睡眠是神经科学中一个比较神秘的现象,人为什么要睡眠一直是个谜。生理心理学家曾经持续剥夺动物的睡
眠,结果三周后动物就死亡了,在对死亡动物大脑解剖中也没有看到明显的损伤。人类睡眠是一个主动的过程,
研究发现睡眠和大脑的信息整理有关系。如果普通人连续几天不睡,记忆损伤确实会比较严重。"
好文章摘抄200字(五)
《希望》
希望与幻想不同。希望是很有可能实现的未来,幻想是不大可能实现的希望。在我们的生活中,常常破灭的不是
希望而是幻想。我们常常为实现不了的愿望而痛苦,是因为我们把幻想当成了希望。
因此当人在对一件事情的希望破灭之后,便会把希望转移到另一件事情上。转移的过程,往往是一个痛苦又无可
奈何的过程。因为转移是在无奈的情况下发生的,在情形有了某种改变之后,人往往会在心中重又燃起对以前的
希望之火。
好文章摘抄200字(六)
《希望是什么》
希望是什么?希望是失败者对成功的一种渴求;希望是死对生的一种企盼;希望是寒冬对春的一种向往。
希望是什么?希望是人生的钟摆,须臾停止不得;希望是太阳升起的地方,光芒四射。
如果低下头表示失望,那么昂起头便是希望。希望的路,千条万条;希望的河,处处可入海洋。
希望是什么?是优美动听的歌;是奇丽无比的小诗;是令人神往的意境;是朝露、晚虹、是阳光……
希望是你,你就是希望!
好文章摘抄300字(七)
《希望》
这以前,我的心也曾充满过血腥的歌声:血和铁,火焰和毒,恢复和报仇。而忽而这些都空虚了,但有时故意地
填以没奈何的自欺的希望。希望,希望,用这希望的盾,抗拒那空虚中的暗夜的袭来,虽然盾后面也依然是空虚
中的暗夜。在最悲伤的时刻,不能忘记信念。最幸福的时刻,不能忘记人生的坎坷。
好文章摘抄200字(八)
隐秘的河湾
历史,虽有庄严的面容,却很难抵拒假装学问的臆想、冒称严谨的偷换、貌似公平的掩饰、形同证据的伪造。它
因人们的轻信而成为舆论,因时间的易逝而难以辩驳,因文痞的无耻而延续谬误,因学者的怯懦而知错不纠。结
果,它所失落的,往往倒是社会进程中的一些最关键的隐秘。
尤其是历史转折时期的隐秘,更其复杂。这是一个最容易被人们忘记的时期,因为不管用转折前还是转折后的坐
标都无法读解它,而无法读解就无法记录。
历史的转折处大多并不美丽,就像河道的弯口上常常汇聚着太多的垃圾和泡沫。美丽的转折一定是修饰的结果,
而修饰往往是历史的改写。'''

text = jieba.lcut(text)
text = [i for i in text if len(i) > 1] #筛选掉不是词语的内容
text = ' '.join(text) #分割词语

w = wordcloud.WordCloud(
    width = 1000,
    height = 800,
    font_path = 'simhei.ttf',
    background_color = 'white',
    colormap = 'rainbow'
)

w.generate(text)
w.to_file(word_cloud[0]+'.png')

Take a look at the effect:

 


That's all for now, thanks for watching!

Guess you like

Origin blog.csdn.net/hu20100913/article/details/128648229