【python】pandas数据处理-聚合-查

http://shzhangji.com/cnblogs/2017/07/23/learn-pandas-from-a-sql-perspective/
先构建

import pandas as pd
import numpy as np
a = [1,2,3,4,5]
b = [11,2,13,4,15]
c = [1,11,3,4,25]
d = np.array([a,b,c])
df = pd.DataFrame(d,columns=['a','b','c','d','e'])
print(df)

axis的理解

axis的重点在于方向,而不是行和列。
当axis=1时,如果是求平均,那么是从左到右横向求平均;如果是拼接,那么也是左右横向拼接;如果是drop,那么也是横向发生变化,体现为列的减少。
考虑了方向,即axis=1为横向,axis=0为纵向,而不是行和列。

合并两个pd格式

pandas.concat(objs, axis=0, join_axes=None, ignore_index=False)
objs:合并对象
axis:合并方式,默认0表示按列合并,1表示按行合并
ignore_index:是否忽略索引

append

合并两个pandas数据,列相同,然后

concat

'''按照行合并'''
df2 = df[['c','d']]
df3 = pd.concat([df,df2], axis = 0, sort=False, ignore_index=True)
print(df3)
df3 = pd.concat([df,df2], axis = 1, sort=False, ignore_index=True)
print(df3)

结果为:

axis = 0
      a     b   c  d     e
0   1.0   2.0   3  4   5.0
1  11.0   2.0  13  4  15.0
2   1.0  11.0   3  4  25.0
0   NaN   NaN   3  4   NaN
1   NaN   NaN  13  4   NaN
2   NaN   NaN   3  4   NaN
axis = 1
    a   b   c  d   e   c  d
0   1   2   3  4   5   3  4
1  11   2  13  4  15  13  4
2   1  11   3  4  25   3  4

merge

'''按照字段左连接合并'''
e = ["2","6","35"]
df2 = pd.DataFrame(np.array([c[:3], e]).T, columns = ['c','E'])
df_merge = pd.merge(df, df2, on = 'c', how = 'left')
print("merge\n", df_merge)
'''left就是按照左边的为准'''

结果为:

   a   b   c   d    E
0  1   1   1   1    2
1  2  12  12   5    6
2  3  13   3   3   35
3  4   1   1  34    2
4  5  15  25  23  NaN

筛选操作

where

df['c'] = df['c'].apply(int)
condition_1 = df['c'] > 5
condition_2 = df_merge['f'].isnull()
condition_3 = df_merge['f'].notnull()
print("where\n", df_merge[condition_1 & condition_2])
print("where\n", df_merge[condition_1 & ~condition_2])
print("where\n", df_merge[condition_1 & condition_3])

结果:

   a   b   c   d    E
4  5  15  25  23  NaN

   a   b   c  d  E
1  2  12  12  5  6

   a   b   c  d  E
1  2  12  12  5  6

聚合

官方文档:https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html
聚合包含两部分,一是分组字段,二是聚合函数
---------------------------------------------------------多个字段---------------------------------------------------------

temp = df_merge.groupby(['b', 'c']).agg({
    'a': np.sum,
    'd': np.max
})
print("group\n", temp)

结果:

b  c         
1  1   14  34
12 12   2   5
13 3    3   3
15 25   5  23

---------------------------------------------------------单个字段---------------------------------------------------------

print(df_merge.groupby('b')['a'].agg(['min', 'max']))

结果:

扫描二维码关注公众号,回复: 8567101 查看本文章
b   min max
1    1   4
12   2   2
13   3   3
15   5   5

---------------------------------------------------------分组迭代---------------------------------------------------------

grouped = df_merge.groupby(['b'])
for index_b, value in grouped:
    print("b =",index_b,"--index:",value.index, "\n",value)
print("direct get\n",grouped.get_group("13"))
b = 1 
   a  b  c   d  E
0  1  1  1   1  2
3  4  1  1  34  2
b = 12 
   a   b   c  d  E
1  2  12  12  5  6
b = 13 
   a   b  c  d   E
2  3  13  3  3  35
b = 15 
   a   b   c   d    E
4  5  15  25  23  NaN
直接获取该分组的结果
   a   b  c  d   E
2  3  13  3  3  35

sort

两种sort。sort_index的作用是啥呢?

1、sort_values

temp_df = df_merge.sort_values(by=['c', 'b'], ascending=False, inplace = False)
print(temp_df)
print(df_merge['b'].dtype)
df_merge['c'] = df_merge['c'].apply(int)
df_merge['b'] = df_merge['b'].apply(int)
temp_df = df_merge.sort_values(by=['c', 'b'], ascending=False, inplace = False)
print(temp_df)
print(df_merge['b'].dtype)

结果:

   a   b   c   d    E
2  3  13   3   3   35
4  5  15  25  23  NaN
1  2  12  12   5    6
0  1   1   1   1    2
3  4   1   1  34    2
object

   a   b   c   d    E
4  5  15  25  23  NaN
1  2  12  12   5    6
2  3  13   3   3   35
0  1   1   1   1    2
3  4   1   1  34    2
int64

2、sort_index

temp_df = df_merge.set_index('b').sort_index()
print(temp_df)

join

结果:

rank

结果:

全部代码


发布了178 篇原创文章 · 获赞 30 · 访问量 8万+

猜你喜欢

转载自blog.csdn.net/ACBattle/article/details/102992966