Break Wednesday, before and after the middle of nowhere, I had to learn the pandas, so that you learn, No.9

On Wednesday, the hardest one day a week

Big middle, so hot today

May, 36-degree heat

Words drifting across the sky

The most handy room school pandas

Groupy DataFrame with Index Levels and Columns

It means hybrid packet and columns by index

Examples go on, (do not start writing examples, do not know how to explain it)

import pandas as pd

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

index = pd.MultiIndex.from_arrays(arrays=arrays,names=['first','second'])

df = pd.DataFrame({'A':[3,1,4,5,9,2,6,1],
                   'B':[1,1,1,1,2,2,3,3]},index=index)


print(df)

复制代码

There are examples, there are examples to show, right

              A  B
first second      
bar   one     3  1
      two     1  1
baz   one     4  1
      two     5  1
foo   one     9  2
      two     2  2
qux   one     6  3
      two     1  3
复制代码

Next, the big move to show the links

I want to be grouped by index index and second column B

Code is the first step, the effect will come later

grouped = df.groupby([pd.Grouper(level=1),'B']).sum()
print(grouped)
复制代码

Note that there are two values groupby see, one pd.Grouper(level=1)the index of the second to the second B columns

Too shake hands, not good painting, hand painting the soul

The main is to make you understand, is how to calculate the packet oh ~

Of course, you can also be grouped by index name

df.groupby([pd.Grouper(level='second'), 'A']).sum()
复制代码

And the above effect is the same as the

Even, we can directly abbreviated as

df.groupby(['second', 'A']).sum()
复制代码

After the data packet portions can be selected, it can be an iterative

This section, in fact, we have achieved over

A chance to relive it

df = pd.DataFrame({'A':['bar', 'bar', 'foo', 'foo', 'foo', 'foo', 'foo'],
                   'B':['one', 'two', 'one', 'two', 'one', 'two', 'three'],
                   'C':[3,1,4,5,9,2,6],
                   'D':[1,1,1,1,2,2,3]})


print(df)

grouped = df.groupby('A')

for name,group in grouped:
    print(name)
    print(group)
复制代码

See the name of the grouping are bar and foo, familiar with it, normal operation

Iteration time, loop can be used for in

bar
     A    B  C  D
0  bar  one  3  1
1  bar  two  1  1
foo
     A      B  C  D
2  foo    one  4  1
3  foo    two  5  1
4  foo    one  9  2
5  foo    two  2  2
6  foo  three  6  3
复制代码

If multiple grouped keys, e.g.groupby(['A','B'])

It will naturally form a tuple name

You can iterate, you can select the section, Part I blog there Oh!

bars = grouped.get_group('bar') # 通过分组的名字
print(bars)
复制代码

And the other one?

df.groupby(['A', 'B']).get_group(('bar', 'one'))
复制代码

Well, for myself, so I write, it is more of a

A large degree of difficulty, coming, aggregate functions

First look at the built-in aggregate functions

sum(), mean(), max(), min(), count(), size(), describe()
复制代码

Actually only so few, it is because I did not write the whole

We have been operating this many times

Then you can look at a more advanced level,

可自定义函数,传入agg方法中
复制代码

We just analyzed by data

 	A      B  C  D
0  bar    one  3  1
1  bar    two  1  1
2  foo    one  4  1
3  foo    two  5  1
4  foo    one  9  2
5  foo    two  2  2
6  foo  three  6  3
复制代码

A and B in accordance with the packet has two values ​​A, B has three values, five groups formed after the packets

Look, do not blink, the operation

grouped = df.groupby(['A','B'])
print(grouped.agg('mean'))
复制代码

The shift in thinking, single row averaging

grouped = df.groupby(['A','B'])
print(grouped['C'].agg('mean'))
复制代码

Continuing conversion idea, a plurality of separate aggregate functions

print(grouped['C'].agg(['mean','sum']))
复制代码

Very powerful, learned it

Continue to be, do not be afraid, find a variety of aggregation operation at the same time change the column name

print(grouped['C'].agg([('A','mean'),('B','max')]))
复制代码

Different columns to use aggregation functions

print(grouped.agg({'C':['sum','mean'],'D':['min','max']}))
复制代码

These are agg dry, I can continue to compile oh ~

groupby, the index may be modified to form non-core noted a parameter addedas_index=False


grouped = df.groupby(['A','B'],as_index=False)

print(grouped.agg({'C':['sum','mean'],'D':['min','max']}))
复制代码

The last operation, agg aggregate functions which can be customized

Generally, this is the case, I do, of course, is no exception friends

grouped = df.groupby('A')

def max_min(group):
    return group.max()-group.min()

print(grouped.agg(max_min))
复制代码

AGG (custom function)

This place custom functions, also supports lambdaoh ~

He confused, confused all right, take live phone line

Shot here, in this shot

Reproduced in: https: //juejin.im/post/5d088203f265da1bd260edc4

Guess you like

Origin blog.csdn.net/weixin_33862514/article/details/93183403