counter()函数和most_common()函数

most_common()函数是collections模块中counter类的函数，当我们使用它时，首先要导入collections模块

counter（）函数返回的是一个类似于字典的counter计数器，如下：

Counter类中的most_common(n)函数:
传进去一个可选参数n(代表获取数量最多的前n个元素，如果不传参数，代表返回所有结果)
return返回一个列表（里面的元素是一个元组，元组第0位是被计数的具体元素，元组的第1位是出现的次数，如：[('a',1),[('b'),2],[('c',3)]]）当多个元素计数值相同时，按照字母序排列。

most_common（）函数的源码：

import heapq as _heapq
def most_common(self, n=None):
    if n is None:
        return sorted(self.iteritems(), key=_itemgetter(1), reverse=True)
    return _heapq.nlargest(n, self.iteritems(), key=_itemgetter(1))

由源码可知，most_commom（）函数不传参数值时，用sort（）函数排序，排序的根据是以第2个元素数值进行排序的，返回的是从大到小的顺序的所有结果，如果有参数n,调用了 heapq模块中的 nlargest 函数返回的前n个的元素结果。(其中 _heapq是上边导入时将heapq 重命名成了 _heapq)

其中heapq.nlargest()函数功能类似于如下的函数：

sorted(iterable, key=key, reverse=True)[:n]

most_commom()函数用法示例如下

>>> from collections import Counter
>>> a=[1,2,4,5,6,4,6,8,9,1,1,2,5,9]
>>> b=Counter(a)
>>> print(b)
Counter({1: 3, 2: 2, 4: 2, 5: 2, 6: 2, 9: 2, 8: 1})
>>> type(b)
<class 'collections.Counter'>
>>> list=b.most_common(5)   #取前5的结果，它不管第6个或后面的值是否与第5个值相等，只返回前5
>>> list
[(1, 3), (2, 2), (4, 2), (5, 2), (6, 2)]

most_common（）函数返回的结果是元组列表，不是字典

most_common()函数的缺点：在于它只返回前n个结果，他不管第n+1个或后面的值是否与第n个值相等，只单纯的返回前n个值。

解决方法：

#返回统计单词数出现最多的前n个单词列表，如果第n+1与第n个相等可以一起并列的返回
def get_count(dct, n):
  data = dct.most_common()
  if(len(data)<=n):
    return list(data)
  else:
    val = data[n-1][1] 
    return list(takewhile(lambda x: x[1] >= val, data)) #返回序列，当predicat

counter()函数和most_common()函数

most_common()函数是collections模块中counter类的函数，当我们使用它时，首先要导入collections模块

猜你喜欢