[Word Count Statistics in Text Files] Count the words that appear the most in the text file of "Hamlet" except some articles, pronouns, and conjunctions, and print the top ten words with the largest number

Statistics is a basic problem in many fields such as computing science, management science, sociology, mathematics, etc. Related problems, methods and techniques form a discipline, namely "statistics"

The problem description is as follows:

Use the python program to count the words that appear the most in the works of "Hamlet", set up the exclusion thesaurus, and exclude some articles, pronouns, conjunctions, etc.

Part of the text file of Hamlet Complete Works is as follows:

If necessary, you can contact the blogger to obtain the text files of the complete works of Hamlet.

The program code is as follows:

excludes = {"the", "and","to","that","his","this","but","of", "you",
            "a", "an","i","we","it", "my","me", "in","your","he"}#排除词库
def getText():
    txt = open("hamlet.txt", "r").read()
    txt = txt.lower()
    for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_‘{|}~':
        txt = txt.replace(ch, " ")  # 将文本中特殊字符替换为空格
    return txt
hamletTxt = getText()
words = hamletTxt.split()
counts = {}
for word in words:
    counts[word] = counts.get(word, 0) + 1
for word in excludes:
    del (counts[word])
items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)
print("Hamlet出现最多的的单词为:")
for i in range(10):
    word, count = items[i]
    print("{0:<10}{1:>5}".format(word, count))

The result of the program running is as follows:

 Friends who see this, don’t forget to like it before leaving!

Follow bloggers to learn more about Python programming knowledge! 

Guess you like

Origin blog.csdn.net/qq_59049513/article/details/122582729