统计哈姆雷特文本中高频词的个数

版权声明:本文为博主原创文章,但部分内容来源自互联网,大家可以随意转载,点赞或留言均可! https://blog.csdn.net/csdn_kou/article/details/83962302

统计哈姆雷特文本中高频词的个数

三国演义人物出场统计

开源代码
讲解视频

kou@ubuntu:~/python$ cat ClaHamlet.py 
#!/usr/bin/env python
# coding=utf-8

#e10.1CalHamlet.py
def getText():
    txt = open("hamlet.txt", "r").read()
    txt = txt.lower()
    for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_‘{|}~':
        txt = txt.replace(ch, " ")   #将文本中特殊字符替换为空格
    return txt
hamletTxt = getText()
words  = hamletTxt.split()
counts = {}
for word in words:			
    counts[word] = counts.get(word,0) + 1
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True) 
for i in range(10):
    word, count = items[i]
    print ("{0:<10}{1:>5}".format(word, count))

猜你喜欢

转载自blog.csdn.net/csdn_kou/article/details/83962302