python 扩展库 pandas

pd.qcut(x,bins,retbins=False)

根据数组x内各数值的频率以及bins数量对x进行等频率分箱。retbins决定是否返回一个含有各切分点的list。返回值首先是一个含有每个x值所对应的分箱区间的list,其次是每个分箱的区间。调用返回对象的.value_counts()函数可查看各分箱对应频率。.describe()函数可展示各区间的count和freq,注意,如果输入为pd.Series,describe函数将展示series类的describe,因此将不展示区间,因此我们需要输入的是pd.Series.values

>>> a=pd.qcut([1,1,2,3,4,4,5,6,7],3)
>>> a
[(0.999, 2.667], (0.999, 2.667], (0.999, 2.667], (2.667, 4.333], (2.667, 4.333], (2.667, 4.333], (4.333, 7.0], (4.333, 7.0], (4.333, 7.0]]
Categories (3, interval[float64]): [(0.999, 2.667] < (2.667, 4.333] < (4.333, 7.0]]
>>> a.value_counts()
(0.999, 2.667]    3
(2.667, 4.333]    3
(4.333, 7.0]      3
dtype: int64
>>> a.describe()
                counts     freqs
categories                      
(0.999, 2.667]       3  0.333333
(2.667, 4.333]       3  0.333333
(4.333, 7.0]         3  0.333333

猜你喜欢

转载自blog.csdn.net/yuanjackson/article/details/84026889
今日推荐