Python data analysis tips: How to implement pivot table in Pandas?
Pivot table is a very useful tool in data analysis, which can help us quickly understand the structure, association and trend of data. In Pandas, we can use the pivot_table() function to implement a pivot table. For example, we have a sales dataset that contains information such as products, sale dates, and sales. Let's start by creating a simple pivot table.
In this example, we have a dataframe with three columns: Product
, , Date
and Sales
. We want to create a pivot table that shows the total sales for each product on each date. We specified the rows, columns, and values of the pivot table, and aggregated the sales using the sum function. After running the code, we can quickly understand the sales of each product on each date. pivot_table()
The function has four parameters:
index
: the column to use as row labels in the pivot table (in this case,Product
thecolumns
: the columns to use as column labels in the PivotTable (in this case,Date
thevalues
: the column to use as value in the pivot table (in this caseSales
,aggfunc
: the aggregate function used in the pivot table (in this case,sum
the
# 数据透视表
import pandas as pd
df = pd.DataFrame({
'Product': ['A', 'B', 'C', 'A', 'B', 'C'],
'Date': ['2019-01-01', '2019-01-01', '2019-01-01', '2019-01-02', '2019-01-02', '2019-01-02'],
'Sales': [100, 200, 300, 150, 250, 350]
})
print(df)
pivot_table = df.pivot_table(index='Product', columns='Date', values='Sales', aggfunc='sum')
print(pivot_table)
Python data analysis: groupby function realizes pivot table function
In addition to using the pivot_table() function, we can also use the groupby() and unstack() functions to implement pivot tables.
In this example, we first use the groupby() function to group the sales data by product and date and calculate the sum of the sales. Next, we use the unstack() function to rearrange the data with dates as columns and products as rows. Finally, we can get a similar pivot table to better analyze and understand the sales data.
Specifically, we can explain this code step by step:
sales_data.groupby(['Product', 'Date'])
: First usegroupby()
the function tosales_data
group,Product
andDate
perform the grouping operation according to the two columns.['Sales'].sum()
Sales
: Sum the columns in each group to get the sum of the sales of each product on each date..unstack()
: Useunstack()
a function to rearrange the data, with dates as columns and products as rows, to get a result similar to a pivot table.
# 使用groupby函数实现数据透视表
import pandas as pd
sales_data = pd.DataFrame({
'Product': ['A', 'B', 'C', 'A', 'B', 'C'],
'Date': ['2019-01-01', '2019-01-01', '2019-01-01', '2019-01-02', '2019-01-02', '2019-01-02'],
'Sales': [100, 200, 300, 150, 250, 350]
})
print(sales_data)
pivot_table = sales_data.groupby(['Product', 'Date'])['Sales'].sum().unstack()
print(pivot_table)