Python第三方库之jieba和wordcloud的使用——计算文章词频以及生成词云

jieba库

主要函数：

在这里插入图片描述
示例：三国演义人物频率统计（粗略版）

# -*- coding: utf-8 -*-
import jieba

excludes = {
    
    "来到","人马","领兵","将军","却说","荆州","二人","不可","不能","如此","如何","天下",\
            "商议","于是","今日","不敢","引兵","次日","军马","军士","主公","大喜","东吴","魏兵",\
            "陛下","都督"}
f = open("/Users/lilhoe/Downloads/jieba和wordcloud库使用的文档/三国演义.txt", "r", encoding="gbk")
txt = f.read()
f.close()
words = jieba.lcut(txt)
counts = {
    
    }

for word in words:
    if len(word) == 1:
        continue
    elif word == "诸葛亮" or word == "孔明曰":
        rword = "孔明"
    elif word == "关公" or word == "云长":
        rword = "关羽"
    elif word == "玄德" or word == "玄德曰":
        rword = "刘备"
    elif word == "孟德" or word == "丞相":
        rword = "曹操"
    else:
        rword = word
    counts[rword] = counts.get(rword,0) + 1

for word in excludes:
    del(counts[word])

items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)

for i in range(20):
    word, count = items[i]
    print("{0:<10}{1:>5}".format(word, count))

运行结果:

Building prefix dict from the default dictionary ...
Loading model from cache /var/folders/3h/nm48hx4502s0fyvfc59cbgwh0000gn/T/jieba.cache
Loading model cost 0.770 seconds.
Prefix dict has been built successfully.
曹操         1429
孔明         1373
刘备         1223
关羽          779
张飞          348
吕布          300
左右          291
孙权          264
赵云          255
司马懿         221
周瑜          217
一人          216
不知          216
汉中          212
众将          206
只见          202
后主          200
袁绍          190
蜀兵          190
夏侯          185

wordcloud库

常用参数：
在这里插入图片描述
常用方法：

示例：1.生成三国演义人物词云（包括计算词频，方形词云）

import jieba
import wordcloud

f = open("/Users/lilhoe/Downloads/jieba和wordcloud库使用的文档/三国演义.txt", "rb")
t = f.read()
f.close()
ls = jieba.lcut(t)
txt = " ".join(ls)
w = wordcloud.WordCloud(font_path="/System/Library/Fonts/STHeiti Light.ttc", width=1000, height=700, background_color="white")
w.generate(txt)
w.to_file("gr.png")

运行结果：
在这里插入图片描述
2.生成三国演义词云：（通过jieba库的输出文件产生爱心形词云）
jieba库输出文件：

曹操	1429
孔明	1373
刘备	1223
关羽	779
荆州	420
张飞	348
吕布	300
孙权	264
赵云	255
司马懿	221
都督	218
周瑜	217
汉中	212
袁绍	190
马超	185
夏侯	185
魏延	177
太守	172
黄忠	168
大军	164

import wordcloud
from PIL import Image
import numpy as np

f = open("/Users/lilhoe/Downloads/jieba和wordcloud库使用的文档/cipin.txt", "r", encoding="gbk")
b = {
    
    }
for txt in f:
    a = txt.split()
    b[a[0]] = eval(a[1])
f.close()
img = Image.open('/Users/lilhoe/Downloads/jieba和wordcloud库使用的文档/xin.png')  # 打开图片
img_array = np.array(img)
wcloud = wordcloud.WordCloud(font_path="/System/Library/Fonts/STHeiti Light.ttc", background_color="white",
                      mask=img_array, ).fit_words(b)
wcloud.to_file("三国演义2.png")
print("词云图片已生成！")

爱心背景：
在这里插入图片描述
输出结果：

Python第三方库之jieba和wordcloud的使用——计算文章词频以及生成词云

目录

jieba库

wordcloud库

猜你喜欢