Xu Lijun 20190912-3 word frequency statistics

Requirements for this job see https://edu.cnblogs.com/campus/nenu/2019fall/homework/6583

My code source address: https://xulijun.coding.net/p/countwords/d/countwords/git

Word frequency statistics SPEC 20180918

1. Function 1 

Small file input. By the total amount of words in the program file statistics and the statistical frequency of occurrence of each word, in a total number of repetitions of the same word does not count, two occurrences of words referred to as 1 times.

1.1 Key difficulties:

(1) to subject to enter the console command line parameters, python code can not be run directly in the console, so I have access to information, understand the python command-line parameter is used sys.argv [] acquired.

(2) First word frequency statistics word in the text you want to remove special characters, I use regular expressions to filter out the document, special characters, all of them replaced with spaces, then loop determine word frequency of each word, if the dictionary there the word +1.

code show as below

DEF getFrequency (TestText): 
    TestText = the re.sub ( ' [^ A-zA-Z0-9n] ' , '  ' , TestText)   # remove non-alphanumeric characters special 
    Frequency} = {   # define word frequency dictionary 
    for Word in testtext.split ():   # cycles each word frequency statistics 
        IF Word in frequency: 
            frequency [Word] +. 1 =
         the else : 
            frequency [Word] =. 1 
    frequency = the sorted (frequency.items (), Key = the lambda X: X [. 1], Reverse = True)   # (Term frequency of each word) sorted according to the dictionary value

 

1.2 shows the effect of

2. Function 2

Support command line enter the file name in English works, please Five personally entry.

2.1 key and difficult

Function Two asked to enter the file name without the suffix txt, I through access to information, using the parameters sys.argv [1] is equal to '-s' to determine the function to perform a function or two.

code show as below:

DEF main (the argv):
     IF   the sys.argv [. 1] == ' -s '    # Get command line parameter to see if equal to '-s' 
        for doCount (the sys.argv [ 2],. 1 )
     elif os.path.isdir ( the sys.argv [. 1 ]): 
        fileFindAndCount (the sys.argv [ . 1]): # read the file in the folder 
    the else : 
        for doCount (the sys.argv [ . 1], 2)

 

 

2.2 Effects show

3. Function 3

Support command line, enter the directory name in English works stored files, batch statistics.

3.1 key and difficult

How to determine the path of a folder is input, can os.path.isdir () method, with the os.listdir () method of traversing folder.

code show as below:

elif os.path.isdir(sys.argv[1]):
        fileFindAndCount(sys.argv[1])

def fileFindAndCount(path1):
    files = os.listdir(path1)
    for file in files:
        if os.path.isfile(file):
           doSomeFileCount(file)

 

 

3.2 shows the effect of

4. Function 4

From the console reads English single works, this is not to make up their faces Five, but for your girlfriend to
the Friends of cool, show that you can provide more suitable for embedded script works (or, as she said, but more flexible
interfaces). If you can not read the requirements, consult senior senior sister apprentice, or bing: linux redirection, although
this feature is also under windows, linux adding a keyword search to quickly find favor.

4.1 key and difficult

Baidu and ask the seniors redirect content, understand the basics, but failed to compile successfully, I will strive to complete the preparation of this function.

PSP table

Summary: the beginning of the first function, due to the access to information, a long time due to the limited capacity and the fourth time function is not compiled successfully, even if the job ended and I intend to strive to write him out.

 

Guess you like

Origin www.cnblogs.com/xulijun811/p/11536904.html