目录
jieba库
主要函数:
示例:三国演义人物频率统计(粗略版)
# -*- coding: utf-8 -*-
import jieba
excludes = {
"来到","人马","领兵","将军","却说","荆州","二人","不可","不能","如此","如何","天下",\
"商议","于是","今日","不敢","引兵","次日","军马","军士","主公","大喜","东吴","魏兵",\
"陛下","都督"}
f = open("/Users/lilhoe/Downloads/jieba和wordcloud库使用的文档/三国演义.txt", "r", encoding="gbk")
txt = f.read()
f.close()
words = jieba.lcut(txt)
counts = {
}
for word in words:
if len(word) == 1:
continue
elif word == "诸葛亮" or word == "孔明曰":
rword = "孔明"
elif word == "关公" or word == "云长":
rword = "关羽"
elif word == "玄德" or word == "玄德曰":
rword = "刘备"
elif word == "孟德" or word == "丞相":
rword = "曹操"
else:
rword = word
counts[rword] = counts.get(rword,0) + 1
for word in excludes:
del(counts[word])
items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)
for i in range(20):
word, count = items[i]
print("{0:<10}{1:>5}".format(word, count))
运行结果:
Building prefix dict from the default dictionary ...
Loading model from cache /var/folders/3h/nm48hx4502s0fyvfc59cbgwh0000gn/T/jieba.cache
Loading model cost 0.770 seconds.
Prefix dict has been built successfully.
曹操 1429
孔明 1373
刘备 1223
关羽 779
张飞 348
吕布 300
左右 291
孙权 264
赵云 255
司马懿 221
周瑜 217
一人 216
不知 216
汉中 212
众将 206
只见 202
后主 200
袁绍 190
蜀兵 190
夏侯 185
wordcloud库
常用参数:
常用方法:
示例:1.生成三国演义人物词云(包括计算词频,方形词云)
import jieba
import wordcloud
f = open("/Users/lilhoe/Downloads/jieba和wordcloud库使用的文档/三国演义.txt", "rb")
t = f.read()
f.close()
ls = jieba.lcut(t)
txt = " ".join(ls)
w = wordcloud.WordCloud(font_path="/System/Library/Fonts/STHeiti Light.ttc", width=1000, height=700, background_color="white")
w.generate(txt)
w.to_file("gr.png")
运行结果:
2.生成三国演义词云:(通过jieba库的输出文件产生爱心形词云)
jieba库输出文件:
曹操 1429
孔明 1373
刘备 1223
关羽 779
荆州 420
张飞 348
吕布 300
孙权 264
赵云 255
司马懿 221
都督 218
周瑜 217
汉中 212
袁绍 190
马超 185
夏侯 185
魏延 177
太守 172
黄忠 168
大军 164
import wordcloud
from PIL import Image
import numpy as np
f = open("/Users/lilhoe/Downloads/jieba和wordcloud库使用的文档/cipin.txt", "r", encoding="gbk")
b = {
}
for txt in f:
a = txt.split()
b[a[0]] = eval(a[1])
f.close()
img = Image.open('/Users/lilhoe/Downloads/jieba和wordcloud库使用的文档/xin.png') # 打开图片
img_array = np.array(img)
wcloud = wordcloud.WordCloud(font_path="/System/Library/Fonts/STHeiti Light.ttc", background_color="white",
mask=img_array, ).fit_words(b)
wcloud.to_file("三国演义2.png")
print("词云图片已生成!")
爱心背景:
输出结果: