Plum rain 20180912-3 word frequency statistics

Requirements for this job see https://edu.cnblogs.com/campus/nenu/2019fall/homework/6583

Code: https: //meixiaoyu.coding.net/p/cipintongji/git

Word frequency statistics SPEC 20180918

Function 1 small file input. To demonstrate the program run, and not the result of real persecution Five, ask him personally key
pad to enter commands at the console.

Heavy and difficult Analysis: 1. need to generate wf.exe file wf.py file

                      2. The implementation file read

code segment:

 DEF main (): 
        INFILE = Open (filename, ' R & lt ' ) 
        COUNT = 100 
        words = [] 
        Data = []
     # read the file

wordCounts={}
for line in infile:
processLine(line.lower(), wordCounts)
pairs = list(wordCounts.items())

#记录词频数
print("total : %d\n"% len(pairs))
items = [[x,y]for (y,x)in pairs]
items.sort()
for i in range(len(items) - 1, -1 , -1):
print(items[i][1] + "\t" + str(items[i][0]))
data.append(items[i][0])
words.append(items[i][1])
infile.close()

# Sort by number of word frequency

The results shown in FIG Run:

 

 

  2 Support command line input function works in English file names, Five personally entry.

Heavy and difficult analysis: larger files, a different sort of frequency of characters required to appear

code segment:

 tf = {}
        for word in word_list2:
            word = word.lower()
                # print(word)
            word = ''.join(word.split())
            if word in tf:
                tf[word] += 1
            else:
                tf[word] = 1
        return tf

    def get_counts(words):
        tf = {}
        for word in words:
            word = word.lower()
            # print(word)
            word = ''.join(word.split())
            if word in tf:
                tf[word] += 1
            else:
                tf[word] = 1
#统计词频

frequencyDic = sorted(frequencyDic.items(), key = lambda x: x[1],
reverse = True)

print('total', len(frequencyDic), 'words', '\n')

if (len(frequencyDic) <= 10):
for x in range(0,len(frequencyDic)):
the_word = frequencyDic[x][0]
the_num = frequencyDic[x][1]
print('%-10s %-10s' %(the_word, the_num))
else:
for x in range(0,10):
the_word = frequencyDic[x][0]
the_num = frequencyDic[x][1]
print('%-10s %-10s' %(the_word, the_num))
print('----')

# According to the frequency sorting, get the highest frequency of the first ten words

Run effect diagram:

 

 3 Support command line input function stores the directory name in English works of the document, the bulk statistics.
> the dir Folder
gone_with_the_wand
runbinson
janelove
> WF Folder
gone_with_the_wand
Total 1234567 words
The 5023
A 4783
Love 4572
Fire 4322
RUN 3822
Cheat 3023
Girls 2783
Girl 2572
Slave 1322
Buy 822
----
runbinson
Total 1234567 words

Heavy and difficult: the batch file to read

Code:

= textFolder folderName 
    fileNameList = []
     for Folder in folderList:
         IF textFolder == Folder: 
            path1 = the os.listdir (Folder)
             for I in path1:
                 IF os.path.splitext (I) [. 1] == ' .txt ' : 
                    fileNameList.append (os.path.splitext (I) [0]) 
# read batch file folder read

Run effect diagram:

 

Function 4 reads English single works from the console, which is not to make up their faces Five, but for your girlfriend to
the Friends of cool, show that you can provide more suitable for embedded script works (or, as she said, but is a more flexible
interface)

About documentation redirect captured understanding is not clear, access to relevant information, simply write some code, but not yet implemented.

def main(argv):
    if sys.argv[1] == '-h':
        print ('test.py -i -s filename.txt')
        sys.exit()
    elif sys.argv[1]=="-s":
        if(len(sys.argv)==3):
        countFileWords(sys.argv[2])
        else:
        redirect_words = sys.stdin.read() 

psp table

 

Guess you like

Origin www.cnblogs.com/MAY6/p/11538294.html