pandas分组统计:groupby,melt,pivot_table,crosstab的用法

groupby: 分组

melt: 宽表转长表

pivot_table: 长表转宽表,数据透视表

crosstab: 交叉表 / 列联表,主要用于分组频数统计

df = DataFrame({'key1':['a','a','b','b','a'],'key2':['one','two','one','two','one'],
        'data1':np.random.randn(5),'data2':np.random.randn(5)})
#[Out]#       data1     data2 key1 key2
#[Out]# 0  0.439801  1.582861    a  one
#[Out]# 1 -1.388267 -0.603653    a  two
#[Out]# 2 -0.514400 -0.826736    b  one
#[Out]# 3 -1.487224 -0.192404    b  two
#[Out]# 4  2.169966  0.074715    a  one

# groupby 用法
group1 = df.groupby('key1')
group2 = df.groupby(['key1','key2'])
[x for x in group1]
group1.size()
group1.sum()
group2.count()
group1['data1','data2'].agg(['mean','sum'])  #作用于所有列
group2(['key1','key2']).apply(lambda x: pd.Series([x.shape[0], x['key1'].mean(), x['key2'].sum()], 
                           index=['counts', 'key1_mean', 'key2_sum']))  #作用于指定列

# melt 用法
pd.melt(df, id_vars=['key1', 'key2'], value_vars=['data1', 'data2'], var_name='var', value_name='value')  #col_level

# crosstab 用法
pd.crosstab(df.key1, df.key2, margins=True)

# pivot_table 用法
# pd.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, 
#         dropna=True, margins=False, margins_name='ALL')  #aggfunc={'d':np.sum, 'e':np.max}
pd.pivot_table(df, index='key1', columns='key2')
df.pivot_table(['data1'], index='key1', columns='key2', fill_value=0)

  参考链接:

Pandas:透视表(pivotTab)和交叉表(crossTab)

pandas模块,Melt函数

Pandas分组统计函数:groupby、pivot_table及crosstab

猜你喜欢

转载自www.cnblogs.com/iupoint/p/11050887.html