Han Hao 20190912-3 word frequency statistics

This job requires See: https: //edu.cnblogs.com/campus/nenu/2019fall/homework/6583

Word frequency statistics SPEC 20180918

 First, the code and version control

Code address: https: //e.coding.net/hanhao/count_words.git

Language: python

The client is using git: git push

Second, focus / difficulties and effects shots

Function 1  small file input. To demonstrate the program run, and not the result of real persecution Five, ask him personally key
pad to enter commands at the console.

main difficulty:

(1) How to py files into exe file

Attach Tutorial: https: //www.cnblogs.com/hanhao970620/p/11537088.html

(2) English characters and remove redundancy check, the solution is to replace the space characters, etc.

Important code shows:

Remove redundant characters:

DEF deal_Redundantwords (String): 
    String = String.Replace ( '\ n-', '') .replace ( ',', '' ) 
    S1 = List (String) 
    NUM = len (S1)  s1.append ( '' )  for in i the Range (NUM): iF s1 [i] in) - (; # $% & * '' "\.? '! : #isalnum detect whether there is an array of strings if str (s1 [i - 1 ]. isalnum ()) == 'True' and (STR (S1 [I +. 1] .isalnum ()) == 'True' ): Pass the else : S1 [I] = '' for I in Range (NUM): IF S1 [I] in ':' : IF S1 [I +. 1] == '/' : Pass the else : S1 [I] = '' connected to the array of characters #join = S '' .join (S1) Print # (S ) return s

Implement a function:

. list1 = text.replace ( '\ n ', '') .lower () split () # save the original data 
    list2 = list (set (list1) ) # deduplication data after 
    IF (In Flag == 0): 
        Print ( "total" + str (len (list2))) # small text statistics vocabulary (function 1 does not output words) 
    the else : 
        Print ( "Total" + str (len (list2)) + "words") # statistics vocabulary 
    Print ( "\ n-" ) 
    Dir_A} = {# calculating the frequency 
    for str1 in List1:  ! str1 = IF '' :  IF str1 in dir_a.keys (): Dir_A [str1] = Dir_A [str1] +. 1 the else : Dir_A [ str1] =. 1 dir_b the sorted = ((Dir_A) .items (), the lambda Key = X: X [. 1], Reverse = True) # sorted according to the frequency

Screenshot implementation of the results:

Proud, breakthrough, difficult places;
the level of understanding of the function: function has no knowledge of the isalnum (), jion (), leading to very difficult to detect string and carried out in the course of the connection string.
The definition of redundancy functions: In the project, I will remove redundant character is defined as a separate function, you can project four function called repeatedly, greatly simplifying the procedures.

Function 2  supports command line input English works of the file name, Five personally entry.

Key / difficulties: reading the file, use the open function (open (filename, mode, buffering, encoding) method), the word into the dictionary word frequency is calculated according to the number of spaces

Important code shows:

  try:
        with open(filename, 'r', encoding='UTF-8') as f_obj:
            content = f_obj.read()
            countNumber(content, flag)
    except FileNotFoundError:
        msg = "sorry,the file " + filename + " does not exist."
        print(msg)

Screenshot implementation of the results:

 Proud, breakthrough, difficult places:

Large File: Screening difficult for large files to be significantly higher than a function of small files, but in python read, open function play a significant role.

3 features  support for command line storage directory with the name of the English works of the document, the bulk statistics.

main difficulty:

(1) In the absence of contact where the document read files in batch, resulting in huge time-consuming.
(2) extraction folder type as a .txt file, enter a filename, and stored in the list.

Important code shows:

Find the file:

path = os.listdir (os.getcwd ()) # get all the files in the current directory 
    folderList = [] 
    for the p-in path: 
        IF os.path.isdir (the p-): # find all folders

Reading file:

Folder in for folderList: 
        IF textFolder == Folder: 
            path1 = the os.listdir (Folder) # the folder list of all the documents completed 
            for I in path1: 
                IF os.path.splitext (I) [. 1] == '.txt ' :  fileNameList.append (os.path.splitext (I) [0])

Screenshot implementation of the results:

 

  Proud, breakthrough, difficult places:

Call the function: Understanding and calls to os.path function and splitext () to solve the issue of access to the file path and file separate suffix, and these functions prior to this project, I do not understand.

功能4 :从控制台读入英文单篇作品,这不是为了打脸老五,而是为了向你女朋
友炫酷,表明你能提供更适合嵌入脚本中的作品(或者如她所说,不过是更灵活
的接口)。

重点/难点 :

(1)对于重定向的理解和实践(这里对老师所提出的四大概念:命令行参数、重定向、标准输入、控制台进行了重新理解)

详细内容求参考:https://www.cnblogs.com/hanhao970620/p/11536985.html

(2)用户输入文件名或者文本内容 捕获问题。

重要代码展示:

    elif sys.argv[1] == "-s":
        if (len(sys.argv) == 3):
            # print(sys.argv)
            flag = 0
            countFileWords(sys.argv[2], flag)
        else:
            # print(sys.argv) #重定向获取文件名
            redirect_words = sys.stdin.read()  # 存储文件名调用方法即可
            flag = 0 countNumber(redirect_words, flag)

执行效果截图:

 得意、突破、困难的地方 :

对重定向的的实践,'<'在重定向中的意义是标准输入,而使用input()方法可以获取输入内容。

三、PSP表格

总结:

    初见作业时,我在很长一段时间内是崩溃的,因为完全无从下手。出于想要了解和尝试python的目的,决定首次使用python完成项目,进行过程中基本一直是处于一个边学边做的状态。本次项目不仅让我在一定意义上认识到了自我,同时也是对自己的一个很大提高。在不停的焦躁和崩溃同时,我算是想python走出了一小步。该项目完成要感谢很多人,老师,室友,包括大学本科同学等等。在完成项目的过程中,我了解并应用了isalnum()replace()os.listdir()等等等一系列python函数用法,并了解了重定向,命令行参数等相关概念。

 

Guess you like

Origin www.cnblogs.com/hanhao970620/p/11537479.html