Hamlet.txt download and implement the text word frequency statistics


Hamlet.txt download the full text: https://python123.io/resources/pye/hamlet.txt


Text the code word frequency statistics ① as follows:

# CalHamlet_1.py
def getText():
    txt = open("Hamlet.txt",'r').read()
    txt = txt.lower()    #将所有文本中的英文全部换为小写字母
    for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~':
        txt = txt.replace(ch, ' ')  #将文本中的特殊字符替换为空格
    return txt
hamletTxt = getText()
words = hamletTxt.split()
counts = {}
for word in words:
    counts[word] = counts.get(word,0) + 1
items = list(counts.items())
items.sort(key = lambda x:x[1], reverse = True)
for i in range(10):
    word, count = items[i]
    print('{0:<10}{1:>5}'.format(word, count))

operation result:

D:\anaconda\new_launch\python.exe D:/pycharm/program/untitled/test.py
the        1138
and         965
to          754
of          669
you         550
i           542
a           542
my          514
hamlet      462
in          436

Process finished with exit code 0

② the code word frequency statistics are as follows:
(exclude most of the articles, pronouns, conjunctions and other grammar-type vocabulary)

# CalHamlet_2.py
excludes = {"the","and","of","you","a","i","my","in"}
#建立排除库,排除掉大多数冠词、代词、连接词等语法型词汇
def getText():
    txt = open("Hamlet.txt",'r').read()
    txt = txt.lower()    #将所有文本中的英文全部换为小写字母
    for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~':
        txt = txt.replace(ch, ' ')  #将文本中的特殊字符替换为空格
    return txt
hamletTxt = getText()
words = hamletTxt.split()
counts = {}
for word in words:
    counts[word] = counts.get(word,0) + 1
for word in excludes:
    del(counts[word])
items = list(counts.items())
items.sort(key = lambda x:x[1], reverse = True)
for i in range(10):
    word, count = items[i]
    print('{0:<10}{1:>5}'.format(word, count))

operation result:

D:\anaconda\new_launch\python.exe D:/pycharm/program/untitled/test.py
to          754
hamlet      462
it          416
that        391
is          340
not         314
lord        309
his         296
this        295
but         269

Process finished with exit code 0


references:

. [1] Song-day ceremony Yan, Huang Tianyu python language programming foundation [M] second edition Beijing: Higher Education Press, 2019: 171-174.

Released seven original articles · won praise 2 · Views 1546

Guess you like

Origin blog.csdn.net/qq_38636076/article/details/104626943