b=pd.pivot_table(df,index=['dashboard_title','username'],values='pv',aggfunc=['sum','max'],sort=True)
b.columns=['cishu','zuida']
b.sort_values('cishu',ascending=False).to_excel(pathstart+'toushi4.xlsx')
b.to_excel(pathstart+'toushi5.xlsx')
b.reset_index().sort_values(['dashboard_title','cishu'],ascending=[1,0]).to_excel(pathstart+'toushi6.xlsx')
b.reset_index().sort_values(['dashboard_title','cishu'],ascending=[1,0]).reset_index().to_excel(pathstart+'toushi7.xlsx')
b.reset_index().sort_values(['dashboard_title','cishu'],ascending=[1,0]).set_index(['dashboard_title','username']).to_excel(pathstart+'toushi8.xlsx')
- After changing the name of the pivot table, it is convenient to sort, as above: b.columns=[];
- The pivot table is still in Dataframe format, which can be directly sorted by sort_values();
- To sort the index, you need to use reset_index() to reset the index to a digital index (0, 1, 2, 3...), the original index will become a column, and directly use sort_values() to add the index converted column name;
- Add reset_index in step 3 to reset the numerical index of the sorted results;
- Adding set_index() in step 3 will revert to the situation of aggregation by index, which is convenient for viewing (but not convenient for filtering, because there will be blanks after aggregation).
- The results are shown below, perspective 4:
- Perspective 5:
- Perspective 6:
- Perspective 7:
- Perspective 8:
- To query the results of the pivot table, use the query() function:
- Multi-condition query is connected with &:
- df.query("not (Quantity == 95)") Get all row data that is not 95
- The only requirement for querying on datetime values using the Query() function is that the column containing these values should be of datatype dateTime64[ns],
df["OrderDate"] = pd.to_datetime(df["OrderDate"], format="%Y-%m-%d")
-
Get all records for the month of August
df.query("OrderDate.dt.month == 8")
-
You can use in to query:
- Use str.contains() to achieve like matching and find all data containing a specific string:
- There are three ways to rename columns: 1. Rename the specified column
Two, all the same name df.columns=['','','',...], three, modify part of the column name,
df.columns = df.columns.str.replace('Animals', 'Reptiles')
Analyze the complete case code:
import pandas as pd
import numpy as np
pathstart='/Users/kangyongqing/Documents/kangyq/202303/分析模版/Superset看板浏览量分析/'
path1=pathstart+'20230306_100429.csv'
df=pd.read_csv(path1)
print(df.columns)
table=df.pivot_table(index=['dashboard_title','username'],values='pv',aggfunc=('sum','max','count'))
print(table.shape)
table.reset_index().sort_values(['dashboard_title','sum'],ascending=[1,0]).set_index(['dashboard_title','username']).rename(columns={'count':'月活跃天数','max':'单天最大活跃次数','sum':'月度累计活跃次数'}).to_excel(pathstart+'公司看板分析.xlsx')
table1=table.reset_index()
table1[table1['dashboard_title'].str.contains('教学部')].sort_values(['dashboard_title','sum'],ascending=[1,0]).set_index(['dashboard_title','username']).rename(columns={'count':'月活跃天数','max':'单天最大活跃次数','sum':'月度累计活跃次数'}).to_excel(pathstart+'教学部看板分析.xlsx')
df.pivot_table(index='username',values='pv',aggfunc=('sum','max','count')).sort_values('sum',ascending=False).rename(columns={'count':'月活跃天数&看板数','max':'单天最大活跃次数','sum':'月度累计活跃次数'}).to_excel(pathstart+'公司最热用户分析.xlsx')
df.pivot_table(index='dashboard_title',values='pv',aggfunc=('sum','max','count')).sort_values('sum',ascending=False).rename(columns={'count':'月活跃天数&人次','max':'单天单人最大活跃次数','sum':'月度累计活跃次数'}).to_excel(pathstart+'公司最热看板分析.xlsx')
df[df['dashboard_title'].str.contains('教学部')].pivot_table(index='username',values='pv',aggfunc=('sum','max','count')).sort_values('sum',ascending=False).rename(columns={'count':'月活跃天数&看板数','max':'单天单人最大活跃次数','sum':'月度累计活跃次数'}).to_excel(pathstart+'教学部最热用户分析.xlsx')
df[df['dashboard_title'].str.contains('教学部')].pivot_table(index='dashboard_title',values='pv',aggfunc=('sum','max','count')).sort_values('sum',ascending=False).rename(columns={'count':'月活跃天数&人次','max':'单天单人最大活跃次数','sum':'月度累计活跃次数'}).to_excel(pathstart+'教学部最热看板分析.xlsx')