pandas-10 pd.pivot_table()透视表功能

pandas-10 pd.pivot_table()透视表功能

和excel一样,pandas也有一个透视表的功能,具体demo如下:

import numpy as np
import pandas as pd
from pandas import Series, DataFrame


#显示所有列
pd.set_option('display.max_columns', None)
#显示所有行
pd.set_option('display.max_rows', None)
#设置value的显示长度为100,默认为50
pd.set_option('max_colwidth',100)


df = pd.read_excel('./sales-funnel.xlsx')
print(df.head())
'''
   Account                          Name            Rep       Manager  \
0   714466               Trantow-Barrows   Craig Booker  Debra Henley   
1   714466               Trantow-Barrows   Craig Booker  Debra Henley   
2   714466               Trantow-Barrows   Craig Booker  Debra Henley   
3   737550  Fritsch, Russel and Anderson   Craig Booker  Debra Henley   
4   146832                  Kiehn-Spinka  Daniel Hilton  Debra Henley   

       Product  Quantity  Price     Status  
0          CPU         1  30000  presented  
1     Software         1  10000  presented  
2  Maintenance         2   5000    pending  
3          CPU         1  35000   declined  
4          CPU         2  65000        won
'''

print(pd.pivot_table(df, index=['Name']))
'''
                              Account   Price  Quantity
Name                                                   
Barton LLC                     740150   35000  1.000000
Fritsch, Russel and Anderson   737550   35000  1.000000
Herman LLC                     141962   65000  2.000000
Jerde-Hilpert                  412290    5000  2.000000
Kassulke, Ondricka and Metz    307599    7000  3.000000
Keeling LLC                    688981  100000  5.000000
Kiehn-Spinka                   146832   65000  2.000000
Koepp Ltd                      729833   35000  2.000000
Kulas Inc                      218895   25000  1.500000
Purdy-Kunde                    163416   30000  1.000000
Stokes LLC                     239344    7500  1.000000
Trantow-Barrows                714466   15000  1.333333
对名字进行了去重,将每个人的销售记录取进行统计,上例是求了均值。
这是由aggfunc参数来决定的。
'''

print(pd.pivot_table(df, index=['Name'], aggfunc='sum'))
'''
                              Account   Price  Quantity
Name                                                   
Barton LLC                     740150   35000         1
Fritsch, Russel and Anderson   737550   35000         1
Herman LLC                     141962   65000         2
Jerde-Hilpert                  412290    5000         2
Kassulke, Ondricka and Metz    307599    7000         3
Keeling LLC                    688981  100000         5
Kiehn-Spinka                   146832   65000         2
Koepp Ltd                     1459666   70000         4
Kulas Inc                      437790   50000         3
Purdy-Kunde                    163416   30000         1
Stokes LLC                     478688   15000         2
Trantow-Barrows               2143398   45000         4
'''

print(pd.pivot_table(df, index=['Name', 'Rep', 'Manager']))
'''
                                                          Account    ...     Quantity
Name                         Rep           Manager                   ...             
Barton LLC                   John Smith    Debra Henley    740150    ...     1.000000
Fritsch, Russel and Anderson Craig Booker  Debra Henley    737550    ...     1.000000
Herman LLC                   Cedric Moss   Fred Anderson   141962    ...     2.000000
Jerde-Hilpert                John Smith    Debra Henley    412290    ...     2.000000
Kassulke, Ondricka and Metz  Wendy Yule    Fred Anderson   307599    ...     3.000000
Keeling LLC                  Wendy Yule    Fred Anderson   688981    ...     5.000000
Kiehn-Spinka                 Daniel Hilton Debra Henley    146832    ...     2.000000
Koepp Ltd                    Wendy Yule    Fred Anderson   729833    ...     2.000000
Kulas Inc                    Daniel Hilton Debra Henley    218895    ...     1.500000
Purdy-Kunde                  Cedric Moss   Fred Anderson   163416    ...     1.000000
Stokes LLC                   Cedric Moss   Fred Anderson   239344    ...     1.000000
Trantow-Barrows              Craig Booker  Debra Henley    714466    ...     1.333333
'''

print(pd.pivot_table(df, index=['Manager', 'Rep']))
# manager 和 rep 之间 存在 一对多的 关系
'''
                              Account         Price  Quantity
Manager       Rep                                            
Debra Henley  Craig Booker   720237.0  20000.000000  1.250000
              Daniel Hilton  194874.0  38333.333333  1.666667
              John Smith     576220.0  20000.000000  1.500000
Fred Anderson Cedric Moss    196016.5  27500.000000  1.250000
              Wendy Yule     614061.5  44250.000000  3.000000
'''

print(pd.pivot_table(df, index=['Manager', 'Rep'], values=['Price', 'Quantity']))
'''
                                    Price  Quantity
Manager       Rep                                  
Debra Henley  Craig Booker   20000.000000  1.250000
              Daniel Hilton  38333.333333  1.666667
              John Smith     20000.000000  1.500000
Fred Anderson Cedric Moss    27500.000000  1.250000
              Wendy Yule     44250.000000  3.000000
'''

print(pd.pivot_table(df, index=['Manager', 'Rep'], values=['Price', 'Quantity'], columns=['Product']))
'''
                               Price               ...    Quantity         
Product                          CPU Maintenance   ...     Monitor Software
Manager       Rep                                  ...                     
Debra Henley  Craig Booker   32500.0      5000.0   ...         NaN      1.0
              Daniel Hilton  52500.0         NaN   ...         NaN      1.0
              John Smith     35000.0      5000.0   ...         NaN      NaN
Fred Anderson Cedric Moss    47500.0      5000.0   ...         NaN      1.0
              Wendy Yule     82500.0      7000.0   ...         2.0      NaN
              
由以上输出可以看出,当column指定为product之后,price和quantity进行了细分,将每个product的详情列出。
另外还可以设置一个fill_value的参数,可以将nan填充为某个值。
'''

'''
总结:
    使用透视表之前,需要对原始数据有一个大概的了解,这样生成的透视表才能够有意义。
'''

猜你喜欢

转载自www.cnblogs.com/wenqiangit/p/11252770.html