On Wednesday, the hardest one day a week
Big middle, so hot today
May, 36-degree heat
Words drifting across the sky
The most handy room school pandas
Groupy DataFrame with Index Levels and Columns
It means hybrid packet and columns by index
Examples go on, (do not start writing examples, do not know how to explain it)
import pandas as pd
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
index = pd.MultiIndex.from_arrays(arrays=arrays,names=['first','second'])
df = pd.DataFrame({'A':[3,1,4,5,9,2,6,1],
'B':[1,1,1,1,2,2,3,3]},index=index)
print(df)
复制代码
There are examples, there are examples to show, right
A B
first second
bar one 3 1
two 1 1
baz one 4 1
two 5 1
foo one 9 2
two 2 2
qux one 6 3
two 1 3
复制代码
Next, the big move to show the links
I want to be grouped by index index and second column B
Code is the first step, the effect will come later
grouped = df.groupby([pd.Grouper(level=1),'B']).sum()
print(grouped)
复制代码
Note that there are two values groupby see, one pd.Grouper(level=1)
the index of the second to the second B columns
The main is to make you understand, is how to calculate the packet oh ~
Of course, you can also be grouped by index name
df.groupby([pd.Grouper(level='second'), 'A']).sum()
复制代码
And the above effect is the same as the
Even, we can directly abbreviated as
df.groupby(['second', 'A']).sum()
复制代码
After the data packet portions can be selected, it can be an iterative
This section, in fact, we have achieved over
A chance to relive it
df = pd.DataFrame({'A':['bar', 'bar', 'foo', 'foo', 'foo', 'foo', 'foo'],
'B':['one', 'two', 'one', 'two', 'one', 'two', 'three'],
'C':[3,1,4,5,9,2,6],
'D':[1,1,1,1,2,2,3]})
print(df)
grouped = df.groupby('A')
for name,group in grouped:
print(name)
print(group)
复制代码
See the name of the grouping are bar and foo, familiar with it, normal operation
Iteration time, loop can be used for in
bar
A B C D
0 bar one 3 1
1 bar two 1 1
foo
A B C D
2 foo one 4 1
3 foo two 5 1
4 foo one 9 2
5 foo two 2 2
6 foo three 6 3
复制代码
If multiple grouped keys, e.g.groupby(['A','B'])
It will naturally form a tuple name
You can iterate, you can select the section, Part I blog there Oh!bars = grouped.get_group('bar') # 通过分组的名字
print(bars)
复制代码
And the other one?
df.groupby(['A', 'B']).get_group(('bar', 'one'))
复制代码
Well, for myself, so I write, it is more of a
A large degree of difficulty, coming, aggregate functions
First look at the built-in aggregate functions
sum(), mean(), max(), min(), count(), size(), describe()
复制代码
Actually only so few, it is because I did not write the whole
We have been operating this many times
Then you can look at a more advanced level,
可自定义函数,传入agg方法中
复制代码
We just analyzed by data
A B C D
0 bar one 3 1
1 bar two 1 1
2 foo one 4 1
3 foo two 5 1
4 foo one 9 2
5 foo two 2 2
6 foo three 6 3
复制代码
A and B in accordance with the packet has two values A, B has three values, five groups formed after the packets
Look, do not blink, the operation
grouped = df.groupby(['A','B'])
print(grouped.agg('mean'))
复制代码
The shift in thinking, single row averaging
grouped = df.groupby(['A','B'])
print(grouped['C'].agg('mean'))
复制代码
Continuing conversion idea, a plurality of separate aggregate functions
print(grouped['C'].agg(['mean','sum']))
复制代码
Very powerful, learned it
Continue to be, do not be afraid, find a variety of aggregation operation at the same time change the column name
print(grouped['C'].agg([('A','mean'),('B','max')]))
复制代码
Different columns to use aggregation functions
print(grouped.agg({'C':['sum','mean'],'D':['min','max']}))
复制代码
These are agg dry, I can continue to compile oh ~
groupby, the index may be modified to form non-core noted a parameter addedas_index=False
grouped = df.groupby(['A','B'],as_index=False)
print(grouped.agg({'C':['sum','mean'],'D':['min','max']}))
复制代码
The last operation, agg aggregate functions which can be customized
Generally, this is the case, I do, of course, is no exception friends
grouped = df.groupby('A')
def max_min(group):
return group.max()-group.min()
print(grouped.agg(max_min))
复制代码
AGG (custom function)
This place custom functions, also supports lambda
oh ~
He confused, confused all right, take live phone line
Shot here, in this shot
Reproduced in: https: //juejin.im/post/5d088203f265da1bd260edc4