西游记相关的分词,出现次数最高的20个吗,需要把是同一个人不同说法,要合并成一个。比如 孙猴子和孙悟空,要算成一个。
预先下载《西游记》txt文档,并保存好,记录地址。
import jieba
# -*- coding: utf-8 -*-
txt = open("D:\\西游记\\西游记.txt", "r", encoding='gb18030').read()
words = jieba.lcut(txt)
counts = {} #创建字典
for word in words:
if len(word) == 1: # 单个词语不计算在内
continue
else:
if word == '行者'or word == '大圣':
counts['孙悟空'] = counts.get('孙悟空', 0) + 1
elif word =='师父' or word == '三藏' or word == '唐僧':
counts['唐三藏'] = counts.get('唐三藏',0) + 1
else:
counts[word] = counts.get(word,0) + 1 # 遍历所有词语,每出现一次其对应的值加 1
items = list(counts.items())#将键值对转换成列表
items.sort(key=lambda x: x[1], reverse=True) # 根据词语出现的次数进行从大到小排序
for i in range(20):
word, count = items[i]
print("{0:<5}{1:>5}".format(word, count))
结果: