numpy 常用工具函数 —— np bincount/np average

分享一下我老师大神的人工智能教程！零基础，通俗易懂！http://blog.csdn.net/jiangjunshow

也欢迎大家转载本篇文章。分享知识，造福人民，实现我们中华民族伟大复兴！

<a href=“http://blog.csdn.net/lanchunhui/article/details/50072453”, target="_blank">numpy 常用api（一）
<a href=“http://blog.csdn.net/lanchunhui/article/details/50429205”, target="_blank">numpy 常用api（二）

一个函数提供 random_state 的关键字参数（keyword parameter）：是为了结果的可再现性（reoccurrence）或叫可重复性。

1. np.bincount()：统计次数

接口为：

numpy.bincount(x, weights=None, minlength=None)
   
   
    
    1

尤其适用于计算数据集的标签列（y_train）的分布（distribution），也即获得 class distribution ：

>>> np.bincount(y_train.astype(np.int32))
   
   
    
    1

>>> np.bincount(np.array([0, 1, 1, 3, 2, 1, 7]))array([1, 3, 1, 1, 0, 0, 0, 1], dtype=int32)   # 分别统计0-7分别出现的次数
   
   
    
    1
    
    2
    
    3

If weights is specified the input array is weighted by it, i.e. if a value n is found at position i, out[n] += weight[i] instead of out[n] += 1.

>>> w = np.array([0.3, 0.5, 0.2, 0.7, 1., -0.6]) # weights>>> x = np.array([0, 1, 1, 3, 2, 2])>>> np.bincount(x, w)array([ 0.3,  0.7,  0.4,  0.7])   # 0: 0.3   # 1:0.5+0.2   # 2: 1+(-0.6)   # 3: 0.7
   
   
    
    1
    
    2
    
    3
    
    4
    
    5
    
    6
    
    7
    
    8

np.bincount() 从零开始计数（不允许序列中出现负数）；

>>> np.bincount([3, 4, 4, 3, 3, 5])array([0, 0, 0, 3, 2, 1], dtype=int32)       # 分别表示0出现的次数，       # 1出现的次数，       # 2出现的次数，       # 。。。
   
   
    
    1
    
    2
    
    3
    
    4
    
    5
    
    6

2. np.average()

np.average(X, axis=0, weights=w) == w.dot(X)

等式左部表示加权平均，sum(w)==1时才有意义，也即等式的左部比等式的右部多了一层加权平均的意义，内积代表着实现该意义的动作。

X = np.array([[.9, .1],              [.8, .2],              [.4, .6]])w = np.array([.2, .2, .6])print(w.dot(X))print(np.average(X, axis=0, weights=w))
   
   
    
    1
    
    2
    
    3
    
    4
    
    5
    
    6

在一些情况下**只能使用np.average()**而无法使用简单的矩阵乘法操作：
比如：

P = np.asarray([c.predict_proba(X) for c in clfs])       # 此时P是一个三维矩阵       # (# of clfs) * (# of samples) * (# of classes)np.average(P, axis=0, weights=w)       # 此时的shape为 ((# of samples) * (# of classes))       # 仍然维持行和为1       
   
   
    
    1
    
    2
    
    3
    
    4
    
    5
    
    6
    
    7

也有一些情况下只能使用 np.average 而无法使用dot（矩阵乘法，matrix multiplication）运算：

def predict_proba(self, X): probas = np.asarray([clf.predict_proba(X) for clf in self.classifiers_]) # return self.weights.dot(probas)    # 此时self.weights有未赋值的风险    # None类型肯定是不支持dot函数的 return np.average(probas, axis=0, weights=self.weights)    # np.average的功能便是，如果weights参数为None    # 就执行正常的求平均操作
   
   
    
    1
    
    2
    
    3
    
    4
    
    5
    
    6
    
    7
    
    8

给我老师的人工智能教程打call！http://blog.csdn.net/jiangjunshow