Chapter 7-1 Frequency statistics (30 points)
Please write programs for a period of English text, statistics on the number of all the different words, and word frequency maximum of the top 10% of the words.
The "word" refers to no more than 80 consecutive string of words composed of characters, but the length of more than 15 will only be taken before the reserved word 15 word characters. The legal "word character" as uppercase and lowercase letters, numbers and underscores, other characters are considered to be word separators.
Input formats:
Input text is given some non-empty, the last symbol #
ends. Input to ensure there are at least 10 different words.
Output formats:
The number of all the different words in the output text in the first row. Note that "word" is not case-sensitive in English, such as "PAT" and "pat" is considered to be the same word.
Word frequency and then in descending order, according to 词频:单词
the top 10% of the maximum output of the word frequency word format. If tied, press increments lexicographical output.
Sample input:
This is a test.
The word "this" is the word with the highest frequency.
Longlonglonglongword should be cut off, so is considered as the same as longlonglonglonee. But this_8 is different than this, and this, and this...#
this line should be ignored.
Sample output :( Note: Although the word the
also appeared four times, but because before we only output 10% (or 23 words in the previous two) word, in alphabetical order, the
ranked No. 3, is not output. )
23
5:this
4:is
answer:
import sys
flag = 1
str1 = ""
while flag == 1:
s=sys.stdin.readline()
for i in s:
if i == "#":
flag = 0
else:
str1 += i
t = set()
for i in str1:
if i.isalnum()==False and i!= "_":
t.add(i)
for i in t:
str1=str1.replace(i," ")
list1 = str1.split()
list2 = []
for i in list1:
if i != "":
if len(i) > 15:
list2.append(i[:15].lower())
else:
list2.append(i.lower())
dic = {}
for i in list2:
dic[i]=dic.get(i,0)+1
ans = sorted(dic.items(),key=lambda x:(-x[1],x[0]))
print(len(ans))
m = int(0.1*len(ans))
for i in range(0,m):
str2 = ""
str2 += str(ans[i][1]) + ":" + ans[i][0]
print(str2)