[Advanced Python] Counting Words with Python

Problem Description:

Implement a function in Python count_words()that takes a string sand a number nand returns sthe nmost frequently occurring words in . The return value is a list of tuples containing the nmost words and their counts, i.e. [(<单词1>, <次数1>), (<单词2>, <次数2>), ... ], in descending order of occurrence.

You can assume that all input is lowercase and contains no punctuation or other characters (just letters and a single space). If the number of occurrences is the same, they are in alphabetical order.

E.g:

print count_words("betty bought a bit of butter but the butter was bitter",3)

output:

[('butter', 2), ('a', 1), ('betty', 1)]

Problem-solving ideas:

1. Split the string s by whitespace to get all word lists split_s, such as: ['betty', 'bought', 'a', 'bit', 'of', 'butter', 'but', 'the ', 'butter', 'was', 'bitter']

2. Create a maplist and convert split_s into a list whose elements are tuples, such as: [('betty', 1), ('bought', 1), ('a', 1), ('bit', 1 ), ('of', 1), ('butter', 1), ('but', 1), ('the', 1), ('butter', 1), ('was', 1), ('bitter', 1)]

3. Merge the elements in the maplist. If the first index value of the tuple is the same, then add the second index value.

// Note: Prepare to use defaultdict. The resulting data is as follows: {'betty': 1, 'bought': 1, 'a': 1, 'bit': 1, 'of': 1, 'butter': 2, 'but': 1, 'the ': 1, 'was': 1, 'bitter': 1}

4. Sort, alphabetically by key, and get the following: [('a', 1), ('betty', 1), ('bit', 1), ('bitter', 1), ('bought ', 1), ('but', 1), ('butter', 2), ('of', 1), ('the', 1), ('was', 1)]

5. Perform secondary sorting, sort by value, and get the following: [('butter', 2), ('a', 1), ('betty', 1), ('bit', 1), (' bitter', 1), ('bought', 1), ('but', 1), ('of', 1), ('the', 1), ('was', 1)]

6. Use slices to take out groups of data with higher frequency

Summary: Sorting results without defaultdict on python3 is also correct, not correct on python2. defaultdict itself has no order, to distinguish the list, it must be sorted.

You can also try to write your own, without the help of third-party modules

Solution 1 (using defaultdict):

  1 from collections import defaultdict
  2 """Count words."""
  3 
  4 def count_words(s, n):
  5     """Return the n most frequently occuring words in s."""
  6     split_s = s.split()
  7     map_list = [(k,1) for k in split_s]
  8     output = defaultdict(int)
  9     for d in map_list:
 10         output[d[0]] += d[1]
 11     output1 = dict(output)
 12     top_n = sorted(output1.items(), key=lambda pair:pair[0], reverse=False)
 13     top_n = sorted(top_n, key=lambda pair:pair[1], reverse=True)
 14 
 15     return top_n[:n]
 16 
 17 
 18 def test_run():
 19     """Test count_words() with some inputs."""
 20     print(count_words("cat bat mat cat bat cat", 3))
 21     print(count_words("betty bought a bit of butter but the butter was bitter", 4))
 22 
 23 
 24 if __name__ == '__main__':
 25     test_run()
View Code

Solution 2 (using Counter)

  1 from collections import Counter
  2 """Count words."""
  3 
  4 def count_words(s, n):
  5     """Return the n most frequently occuring words in s."""
  6     split_s = s.split()
  7     split_s = Counter(name for name in split_s)
  8     print(split_s)
  9     top_n = sorted(split_s.items(), key=lambda pair:pair[0], reverse=False)
 10     print(top_n)
 11     top_n = sorted(top_n, key=lambda pair:pair[1], reverse=True)
 12     print(top_n)
 13 
 14     return top_n[:n]
 15 
 16 
 17 def test_run():
 18     """Test count_words() with some inputs."""
 19     print(count_words("cat bat mat cat bat cat", 3))
 20     print(count_words("betty bought a bit of butter but the butter was bitter", 4))
 21 
 22 
 23 if __name__ == '__main__':
 24     test_run()
 25 
View Code

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325823834&siteId=291194637