python English word frequency statistics _

Code

Everyone is writing Chinese word frequency statistics, I contacted the python has for several years, also wrote in English, really, it is. Directly attached to a bar code.

text = """
British newspapers are much smaller than they used to be and their readers are often in a hurry , so newspapermen write as few words as possible . 
They tell their readers at once what happened , where , when and how it happened and what was the result : how many people were killed , what change was done and so on . 
Readers want the fact set out as fully and accurately as possible . 
Readers are also interested in the people who have seen the accident . 
So a newspaperman always likes to get some information from someone who was there , which can be given in the person’s own words . 
Because he can use only a few words , the newspaperman must choose those words carefully , every one must be effective . 
Instead of “ he called out in a loud voice ” , he writes ” he shouted ” ; instead of “the loose stones rolled noisily down the side of the mountain ” , he will write ” they thundered down the mountainside ” . 
Because many of the readers are not very clever, and most of them are in a hurry.
"""

def getTxt(txt): #对文本预处理(包括)
    txt = txt.lower()#将所有的单词全部转化成小写
    for ch in ",,,.!、!@#$%^'”“;'’": #将所有除了单词以外的符号换成空格
        txt=txt.replace(ch, ' ')
    return txt

txtArr = getTxt(text).split()
counts = {}
for word in txtArr:
    counts[word] = counts.get(word, 0) + 1
countsList = list(counts.items())
countsList.sort(key=lambda x:x[1], reverse=True)
for i in range(20):
    word, count = countsList[i]
    print('{0:<10}{1:>10}'.format(word,count))

Code Description

  • Baidu to find an article in English reading, word frequency statistics as text.
  • str.lower (), all the words all converted to lowercase and then return to the conversion result, the original unchanged str
  • str.replace ( 'a', 'b'), the str all the characters into a character b and returns the result after the change, the original unchanged str
  • str.split (), split () with no parameters to default to all null characters, including spaces, linefeed (\ n-), tab (\ t) as the delimiter divided str, segmentation and returns the result (list)
  • dic.get ( "a", val) , to remove the key in the dictionary in dic acorresponding value, if the key does not exist in the dictionary aof key-value pairs, then return val
  • the list.sort (Key = None, Reverse = False)
    Key - mainly used for the comparison element, only one parameter is a function of specific parameters taken from the iterables, the element may specify a subject to iteration Sort.
    reverse - collation, reverse = True descending, reverse = False ascending (default).
    Wen using a lambda expression, lambda is specifier, followed by the parameter, in front of the parameters, after the colon lambda expression is a result of the processing of this expression, the parameters are x, the processing result x[1]. the latter sort of key parameters give expression evaluation element of a list. Such as: list as [('a':5),('b':3)], respectively, the execution will sort ('a':5)and ('b':3)assign key lambda expression behind, that is, xthe parameters will receive these two values.
    countsList.sort(key=lambda x:x[1], reverse=True)
    #等同与
    def takeSecond(elem):
    	return elem[1]
    countsList.sort(key=takeSecond, reverse=True)
    
  • In python3 print it has been a function of, python2 may be print a, must python3 print(a).
  • In python3 may help(print), (note, in python2 is not help(print)because it is not a function)
  • print('{0:<10}{1:>10}'.format(word,count))Parameter parentheses first brace 0 indicates that the braces to the first parameter word format placeholder,: the <Left numerals this column, 10 represents the length of column 10. Second braces. 1 shows the format of the braces to the second count parameter placeholder,: after> column represents the right alignment, the column 1010 represents the length of 10. Only units, then someone can tell me to figure out. . .

operation result

Did not cut full, look on the line

  • Next to do. . . Word and Chinese word cloud it, looked like fun.
Released four original articles · won praise 0 · Views 38

Guess you like

Origin blog.csdn.net/weixin_44385465/article/details/104278874