Pandas pivot table, sorting, and result query once clear

b=pd.pivot_table(df,index=['dashboard_title','username'],values='pv',aggfunc=['sum','max'],sort=True)
b.columns=['cishu','zuida']
b.sort_values('cishu',ascending=False).to_excel(pathstart+'toushi4.xlsx')
b.to_excel(pathstart+'toushi5.xlsx')
b.reset_index().sort_values(['dashboard_title','cishu'],ascending=[1,0]).to_excel(pathstart+'toushi6.xlsx')
b.reset_index().sort_values(['dashboard_title','cishu'],ascending=[1,0]).reset_index().to_excel(pathstart+'toushi7.xlsx')
b.reset_index().sort_values(['dashboard_title','cishu'],ascending=[1,0]).set_index(['dashboard_title','username']).to_excel(pathstart+'toushi8.xlsx')
  1. After changing the name of the pivot table, it is convenient to sort, as above: b.columns=[];
  2. The pivot table is still in Dataframe format, which can be directly sorted by sort_values();
  3. To sort the index, you need to use reset_index() to reset the index to a digital index (0, 1, 2, 3...), the original index will become a column, and directly use sort_values() to add the index converted column name;
  4. Add reset_index in step 3 to reset the numerical index of the sorted results;
  5. Adding set_index() in step 3 will revert to the situation of aggregation by index, which is convenient for viewing (but not convenient for filtering, because there will be blanks after aggregation).
  6. The results are shown below, perspective 4:
  7. Perspective 5:
  8. Perspective 6:
  9. Perspective 7:
  10. Perspective 8:
  11. To query the results of the pivot table, use the query() function:
  12.  Multi-condition query is connected with &:
  13.  df.query("not (Quantity == 95)") Get all row data that is not 95
  14. The only requirement for querying on datetime values ​​using the Query() function is that the column containing these values ​​should be of datatype dateTime64[ns],

    df["OrderDate"] = pd.to_datetime(df["OrderDate"], format="%Y-%m-%d")

  15. Get all records for the month of August

    df.query("OrderDate.dt.month == 8")

  16. You can use in to query:

  17. Use str.contains() to achieve like matching and find all data containing a specific string:

  18. There are three ways to rename columns: 1. Rename the specified column

 Two, all the same name df.columns=['','','',...], three, modify part of the column name,

df.columns = df.columns.str.replace('Animals', 'Reptiles')

Analyze the complete case code:

import pandas as pd
import numpy as np
pathstart='/Users/kangyongqing/Documents/kangyq/202303/分析模版/Superset看板浏览量分析/'
path1=pathstart+'20230306_100429.csv'
df=pd.read_csv(path1)
print(df.columns)

table=df.pivot_table(index=['dashboard_title','username'],values='pv',aggfunc=('sum','max','count'))
print(table.shape)


table.reset_index().sort_values(['dashboard_title','sum'],ascending=[1,0]).set_index(['dashboard_title','username']).rename(columns={'count':'月活跃天数','max':'单天最大活跃次数','sum':'月度累计活跃次数'}).to_excel(pathstart+'公司看板分析.xlsx')
table1=table.reset_index()
table1[table1['dashboard_title'].str.contains('教学部')].sort_values(['dashboard_title','sum'],ascending=[1,0]).set_index(['dashboard_title','username']).rename(columns={'count':'月活跃天数','max':'单天最大活跃次数','sum':'月度累计活跃次数'}).to_excel(pathstart+'教学部看板分析.xlsx')

df.pivot_table(index='username',values='pv',aggfunc=('sum','max','count')).sort_values('sum',ascending=False).rename(columns={'count':'月活跃天数&看板数','max':'单天最大活跃次数','sum':'月度累计活跃次数'}).to_excel(pathstart+'公司最热用户分析.xlsx')
df.pivot_table(index='dashboard_title',values='pv',aggfunc=('sum','max','count')).sort_values('sum',ascending=False).rename(columns={'count':'月活跃天数&人次','max':'单天单人最大活跃次数','sum':'月度累计活跃次数'}).to_excel(pathstart+'公司最热看板分析.xlsx')

df[df['dashboard_title'].str.contains('教学部')].pivot_table(index='username',values='pv',aggfunc=('sum','max','count')).sort_values('sum',ascending=False).rename(columns={'count':'月活跃天数&看板数','max':'单天单人最大活跃次数','sum':'月度累计活跃次数'}).to_excel(pathstart+'教学部最热用户分析.xlsx')
df[df['dashboard_title'].str.contains('教学部')].pivot_table(index='dashboard_title',values='pv',aggfunc=('sum','max','count')).sort_values('sum',ascending=False).rename(columns={'count':'月活跃天数&人次','max':'单天单人最大活跃次数','sum':'月度累计活跃次数'}).to_excel(pathstart+'教学部最热看板分析.xlsx')

Guess you like

Origin blog.csdn.net/Darin2017/article/details/129323470