Song Xiaoli 20190912-3 word frequency statistics

This job requires see [ https://edu.cnblogs.com/campus/nenu/2019fall/homework/6583 ]

The programming language used for the job Python, which is the address of the code [ https://sxl357.coding.net/p/wy/git ]

Function 1 small file input

Function 2 supports command line input English works of the file name

3 features support for command line input is stored in the file directory name English works

4 function into English from the console reads single works

1. Function 1

Heavy and difficult:

(1) How to enter command line parameters?

Sys module is introduced, sys.argv [] explicitly implemented read command line parameters.

(2) How to open files?

Os module is introduced, using open () function opens the file and returns a file object, the basic form of open () function is: open (file, mode), file of the file path, mode to the operation mode.

Because the file path must have the suffix .txt, so check incoming files path there is no .txt, if not add.

Since py default ASCII code file, while displaying Chinese do to the system default encoding an ASCII conversion, being given time, so the open () function Riga 'encoding =' utf-8 ', intended for the bytes encoding Unicode saved so met Chinese will not go wrong.

(3) how to organize all the words in a file list?

Using the re module findAll () function, findall ( pattern, stringsubstring) Returns the string can be in the form of a list with the matching pattern, r '[a-z0-9 ^ -] +', wherein r is identified, [a-z0 -9 ^ -] + is a regular expression that matches all lowercase letters, numbers, and non hyphen '-' of the strings, with the read () function reads the text, because the front matched mentioned only lowercase so with lower () function in the text all uppercase letters lowercase letters.

(4) How to count the number of times each word appears?

Used to track the number of function values ​​occurring Counter collections module (), the key-value pairs in the dictionary is stored, wherein the element is a key, which count is value, which returns a dictionary

(5) how the statistics of the total number of words (not repeat)?

Traversing Counter () function to generate the dictionary, setting the initial value of num 1, a key value for each traversal, num value plus 1

Important Code:

 

from Collections Import Counter
 Import SYS
 from Re Import findAll
 DEF statistic (name):
     # determines whether the incoming command line parameters contained .txt, if not, add to, and then open the path as 
    D = ' .txt ' 
    IF D in name : 
        path = name
     the else : 
        path = name + D 
    F = Open (path, ' R & lt ' , encoding = ' UTF-. 8 ' ) 
    Lists = findAll (R & lt ' [A-Z0-9 ^ -] + '., f.read () Lower ()) 
    words = Counter (Lists)
     # traversing the dictionary, key statistics on the number 
    NUM = 0
     for Key, value in words.items (): 
        NUM + 1 =

 

 

Screenshot implementation of the results:

2. Function 2

Heavy and difficult:

(1) how to distinguish features and functionality 1 2?

With sys.argv [1] == '- s' function to distinguish between 1 and 2 function, if established, is a function of, if not satisfied, and the incoming is not a folder, is the function 2

When (2) how to distinguish the output of 'words'?

Function 1 and Function 2 should call statistic () function, in the function with sys.argv [1] == 's' is determined whether the output 'words'.

(3) Why not use from sys import argv rather use import sys?

Using from sys import argv statement, the number of input parameters must be a, b, c, d ...... = argv variables in the same number assigned, otherwise an error;

And import sys will not, even if the input number is greater than the number of read does not matter, the sys.argv [] read only bit.

(4) how the number of occurrences of up to 10 words figured out?

With most_common (n ) return function of the maximum count value n elements of list element

Important Code:

 #功能1不输出words,功能2输出words
    if sys.argv[1]=='-s':
        print('total'+' '+str(num))
    else:
        print('total'+' '+str(num)+' words')
    maxwords=words.most_common(10)
    for i in maxwords:
        print('%-8s%5d'%(i[0],i[1]))
 #功能1
    if sys.argv[1]=='-s':
        statistic(sys.argv[2])
 #功能2
    else:
        statistic(sys.argv[1])

执行效果截图:

3.功能3

重难点:

(1)如何处理文件和目录?

引入os模块,os.path.isdir()判断是否是文件夹,os.listdir()将文件夹中的文件列表化,列出目录下的所有文件(或许有更好的方法),os.path.isfile()用来判断是否是文件。

(2)如何同前两个功能区分?

在argv[1]=='-s'不成立情况下,用os.path.isdir(argv[1])判断传入的是文件还是文件夹,从而判断功能2还是功能3 ,成立就是功能3,否则就是功能2.

(3)如何只输出文件名而不输出后缀.txt?

os.path.splitext(file)[0]将文件名与后缀分开,将文件名显示出来

重要代码:

#传入文件夹
def liststatistic(path):
    files=os.listdir(path)
    for file in files:
        filename=os.path.splitext(file)[0]
        print(filename)
        statistic(file)
        print('----')
 #功能3
    elif os.path.isdir(sys.argv[1]):
        liststatistic(sys.argv[1])

执行效果截图:

4.功能4

 

 还未实现

5.psp

 6.总结

完成过程

(1)无从下手

刚刚拿到作业时,整个人都是懵的,这对于我来说,无异于不会走时去跑,但总是要做下去的。首先是语言的选择,c,java还是Python,因为Java刚开始接触,也不会用C++做面向对象,Python里面模块比较多,可能更合适,所以选择了Python。那从哪下手呢?通过跟同学,师哥师姐交流,在啥都不会的情况下,模仿是最好的学习方法。

(2)模仿

因为往届的师哥师姐已经做过这个联系,他们的博客都能在博客园找到,我在里面读了几篇用Python写的,选择了一篇,找到代码,先运行出来,再研读,将代码打印出来,一句句研究,不会的就百度或和同学讨论,理解透了之后,开始一遍遍的敲,直至完完全全的自己能写出来,变成自己的。在运行时,将py文件转成exe文件花了大量的时间,附上一个有用的链接[http://blog.ijunyu.top/2018/09/20/py2exe/#more]

(3)改进

在已经理解透的基础上,我觉得代码还可以更简洁,有些功能还能用其他方法实现,开始改进,又出现了各种问题,再一个个的解决,出现了这个不是很完美的作品。

 

Guess you like

Origin www.cnblogs.com/simpleli66/p/11538293.html