Read a text file with Python and word frequency statistics

Just at the time of writing 360 browser crashes, the results of content or get it back, thanks to auto-save feature blog Park! ! !

------------ ------------ restore content begins

Recently in learning Python, wrote a small program that can read text documents from the specified path, and count the number in which each word appears and prints

. 1  Import OS
 2  # This method is used to create folders and files 
. 3  DEF CreateFile (fileName, Content, R & lt filePath = ' D: / PythonExercise / ' ):
 . 4  #      Create a folder 
. 5      os.mkdir (filePath)
 . 6      fullPath = filePath + fileName
 . 7      F = Open (fullPath, ' W ' )
 . 8      f.write (Content)
 . 9      f.close
 10  # the following sentence written to the specified file 
. 11 CreateFile ( ' test.txt ' , "! Short Life IS, SO apos Just enjoying the let the Python " )
 12 is  
13 is  # This method is used to read files and word frequency statistics 
14  DEF getWordsFrequency (R & lt fullFilePath = ' D: /PythonExercise/test.txt ' ):
 15      F = Open (fullFilePath , ' R & lt ' )
 16  #      read the contents, separated by spaces, if not pass Split parameters, the default is a space, the following applies to English 
. 17       tmp = f.readline (). Split ()
 18 is  # the following applies to Chinese, Since there is no space between Chinese characters, can be read out a whole str, so use the list () converts characters into a single content of the List 
. 19  #     tmp = List (f.readline ()) 
20 is  
21 is      f.close ()
 22 is      print(tmp)
 23 is  # punctuation set 
24      punctuation = '' ' ~ @ # $% ^ & * () _ + - [] {};:!.?, / " ' '' 
25  # if only separated by a space, will get some combination of words and punctuation, such as "if," not! "and the like, and the contents contained therein traverse list punctuation separated, removed and the divided original content list applied in the original list 
26 is      for I in tmp :
 27          for P in punctuation:
 28              IF P in I:
 29                  TMP1 = i.split (P)
 30                  tmp.remove (I)
 31 is                  tmp.extend (TMP1)
 32  # empty element '' is removed and the original word All lower case letters converted to 
33 is      for Jin tmp:
 34 is          IF J == '' :
 35  #              Print ( "Remove the let apos null") 
36              tmp.remove (J)
 37 [          the else :
 38 is  #              Print ( "GET Lowers the let apos") 
39              tmp [tmp.index (J) ] = j.lower ()
 40  #              tmp.replace (J, j.lower ())             
41 is  # above had been removed if statement '' character but somehow come off the last one, so once removed again 
42      the while tmp.count ( '' ) =! 0:
 43 is          tmp.remove ( '' )
 44 is  #     Print (tmp.count ( '')) 
45  #      Print ( 'Lower Case After tmp', tmp) 
46 is  # the word list to the processed and converted to a weight tuple, to facilitate later with 
47      Keys = tuple (SET (tmp) )
 48      Print (Keys)
 49  # to generate a upper and keys, i.e., the same list element of length to weight the word, and the first initial value is 0, facilitate subsequent statistical word frequency 
50      FREQ list = (I 0 * for I in the Range (len (keys)))
 51  #      Print (freq) 
52  # obtain keys from the word, and the number tmp statistics appear, the number assigned to the elements in the freq, freq because the length and keys the same, so freq keys and serial numbers may correspond to make them easier composition dictionary 
53 is      for words in keys:
 54 is          FREQ [keys.index (words)] =tmp.count (words)
 55  #      Print (FREQ) 
56 is  # Create a dictionary 
57 is      freqDict = {}
 58  # will be introduced into the dictionary keys batch key 
59      freqDict = dict.fromkeys (keys)
 60  # At this time, if the print can be seen freqDict to its full value of None 
61 is  #      Print (freqDict) 
62 is  # the keys and values above and assign one correspondence freq freqDict corresponding key 
63 is      for words in keys:
 64  #          Print (freqDict [words]) 
65          freqDict [words] = FREQ [keys.index (words)]
 66      Print (freqDict)
 67     return freqDict
 68  run the function can print out the word frequency in the form dictionary
 69  getWordsFrequency ()
 70  
71 is  the following statement from above word read Randomly 10 print
 72 wordSet = List (getWordsFrequency (). Keys ())
 73 is # Print (wordSet)
 74  Import random R & lt AS
 75  extracts 10 different elements, this method can be a random number to the weight
 76 randomWords = r.sample (wordSet, 10 )
 77  by the following three lines may be drawn 10 words, but may overlap value
 78  # randomWords = [] 
79  # for I in Range (10): 
80  #      randomWords.append (r.choice (wordSet)) 
81  Print (randomWords)

The results of the program output

(1) from bing.com international version of a random hot search selected news in a while and saved to test.txt, the results are as follows

{'orbit': 1, 'hanging': 2, 'another': 1, 'pretty': 2, 'planets': 2, 'planet': 2, 'of': 2, 'system': 2, 'a': 4, 'rings': 1, 'two': 1, 'there’s': 1, 'life': 1, 'claim': 1, 'features': 1, 'moons': 3, 'both': 1, 'conditions': 1, 'means': 1, 'survey': 1, 'moon': 3, 'chance': 1, 'possible': 1, 'with': 1, 'our': 2, 'body': 1, 'have': 3, 'cnet': 1, 'is': 1, 'uranus': 1, 'red': 1, 'jupiter': 1, 'could': 2, 'earth’s': 2, 'reports': 1, 'several': 1, 'main': 1, 'be': 1, 'are': 1, 'which': 2, 'that’s': 1, 'fame': 1, 'sky': 1, 'earth': 2, 'gravity': 1, 'while': 1, 'place': 1, 'being': 1, 'call': 1, 'spot': 1, 'famous': 2, 'eyes': 1, 'it’s': 1, 'an': 1, 'for': 4, 'that': 3, 'right?': 1, 'solar': 2, 'distinctive': 1, 'its': 5, 'no': 1, 'orbiting': 1, 'has': 4, 'special': 1, 'mercury': 1, 'astronomers': 1, 'mini': 1, 'wondrous': 1, 'human': 1, 'now': 1, 'catalina': 1, 'asteroid': 1, 'single': 1, 'rock': 1, 'this': 1, 'cool': 1, 'and': 3, 'would': 1, 'all': 1, 'on': 1, 'Tuscon': 1, 'out': 2, 'at': 1, 'to': 2, 'az': 1, 'but,': 1, 'saturn': 1, 'use': 1, 'in': 5, 'it': 2, 'many': 1, 'the': 3, 'make': 1, 'home': 1, 'like': 1, 'perfect': 1, 'only': 1}

[ 'to', 'System', 'Reports', 'IT', 'Would', 'NO', 'are', 'Pretty', 'All', 'the make'] 

(2) from Sina news selected in a, wherein the second segment and saved to test.txt, the results are as follows

{ 'Heard': 1, 'on': 1, 'home': 2, 'table': 1, 'tube': 1, 'green': 4 'Service': 1, 'will': 1 ' Phytophthora ': 1,' Department ': 1' was': 3 'non': 1, 'a': 1, 'hospital': 1, 'with': 1, 'France': 1, 'day' : 1, 'administered': 2 'grid': 1, 'work': 1, 'most': 1, 'control': 2 'take': 1, ',': 3, 'measures': 1 'cloth': 1, 'strict': 2, 'new': 1, 'Paul': 1 'Li': 1, 'as': 1'. ': 2,' call ': 1' to ': 1,' play ': 1,' wild ': 3' bureau ': 2,' and ': 2' movable ': 3' guard ': 1, 'WANG': 1 'shown': 1, 'I'm': 2 'real': 1 'sub': 1, 'field': 1, '2': 1, 'trade': 1, 'status': 1, 'Shao': 1, 'length': 1, 'dimension': 1, 'love': 2,

{"aren't": 1, 'implicit': 1, 'right': 1, 'practicality': 1, 'nested': 1, 'although': 3, 'beautiful': 1, 'break': 1, 'errors': 1, 'of': 3, 'refuse': 1, 'a': 2, 'dense': 1, 'more': 1, 'easy': 1, "you're": 1, 'sparse': 1, 'peters': 1, 'do': 2, 'may': 2, 'explicit': 1, 'implementation': 2, 'often': 1, 'great': 1, 'pass': 1, 'those': 1, 'purity': 1, 'is': 10, 'ambiguity': 1, 'face': 1, 'be': 3, 'by': 1, 'are': 1, 'silently': 1, 'cases': 1, 'bad': 1, 'idea': 3, 'if': 2, "it's": 1, 'not': 1, 'counts': 1, 'zen': 1, 'readability': 1, 'that': 1, 'honking': 1, 'temptation': 1, 'than': 8, 'ugly': 1, 'Dutch': 1, "let's": 1, 'guess': 1, 'namespaces': 1, 'special': 2, 'better': 8, 'now': 1, 'good': 1, 'complicated': 1, 'now.': 1, 'simple': 1, 'complex': 2, 'there': 1, 'python': 1, 'first': 1, 'way': 2, 'and': 1, 'beats': 1, 'hard': 1, 'explicitly': 1, 'silenced': 1, 'at': 1, 'to': 5, 'obvious': 2, 'never': 3, 'tim': 1, 'in': 1, 'one': 3, 'explain': 2, 'unless': 2, 'enough': 1, 'preferably': 1, 'should': 2, 'it': 2, 'the': 6, 'flat': 1, 'rules': 1, 'only': 1}
<class 'list'>
['of', 'honking', 'preferably', 'by', "you're", 'complicated', 'sparse', 'and', 'pass', 'enough']


End ------------ ------------ restore content

 

To prevent failure of the link, manually in the text in the following three paragraphs 1,2,3

1、

Several planets in our solar system are famous for distinctive features. Saturn has its wondrous rings and Jupiter has its famous red spot, while Uranus has its many moons and planets like Mercury have no moons at all. Earth’s main claim to fame is it being the only planet in the solar system with the perfect conditions for human life and a single moon, both of which make it a pretty special place for use to call home. But, there’s a chance that Earth could have another moon hanging out in its orbit for now. CNET reports that Catalina Sky Survey astronomers in Tuscon, AZ has its eyes on an asteroid hanging out in Earth’s gravity. It’s possible that this body of rock could be a mini-moon orbiting our planet, which means Earth would have two moons. That’s pretty cool, right?

2、

The State Council joint prevention and control mechanism 27, held a news conference to introduce the ban and resolutely crack down on illegal work of wildlife markets and trade. State Bureau of Animal and Plant Protection Division of wild grass Deputy Director Wang Weisheng said that since the outbreak, the National Bureau of grass implemented the most stringent wildlife control a series of measures.

3、

The Zen of Python, by Tim Peters  Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those!

Guess you like

Origin www.cnblogs.com/flyingtester/p/12375408.html