The two operations of reshape and pivot only change the layout of the data table:
stack (column index—>row index) and unstack function (row index—>column index) for reshape (reverse each other) Operation)
pivot and melt functions for perspective (reverse operations for each other)
Reshape
Basic demo
Now let’s create some data
symbol = ['JD', 'AAPL']
data = {
'行业': ['电商', '科技'],'价格': [25.95, 172.97],'交易量': [27113291, 18913154]}
df = pd.DataFrame( data, index=symbol )
df.columns.name = '特征'
df.index.name = '代号'
Use stack()
,unstack()
q = df.stack() 或者添加参数 df.stack('特征')
q = df.unstack() 或者添加参数 df.unstack('代号')
Screenshot of the effect.
Observe the output results above carefully. You can find that the stack method adds the column index of df to the row index of df to become a multi-level index (MultiIndex), and unstack adds the row index of df to the row index of df. Become a multi-level index (MultiIndex).
Multi-layer DataFrame
data = [
['电商', 101550, 176.92],
['电商', 175336, 25.95],
['金融',60348, 41.79],
['金融', 36600, 196.00] ]
midx = pd.MultiIndex( levels=[['中国','美国'], ['BABA','JD', 'GS', 'MS']],
labels=[[0,0,1,1],[0,1,2,3]],
names = ['地区', '代号'])
mcol = pd.Index(['行业','雇员','价格'], name='特征')
df = pd.DataFrame( data, index=midx, columns=mcol )
From the above figure, we can see that the primary index of the row is the region [China, the United States], the secondary index is the code name [BAB, JD, GS, MS], and the column index is the characteristic [industry, employee, price]
Use unstack based on multi-level index (column -> row)
df.unstack(0)
df.unstack(1)
Their row index and column index are as shown in the figure below. The column index becomes a multi-level index
and can be applied jointly
df.unstack(0).stack(1)
Their row index and column index are:
there are many situations that can be operated, and those who are interested can try more!
perspective
Perspective: A method or technique for depicting the spatial relationship of objects on a plane or curved surface. The data source table usually contains only rows and columns, so there are often duplicate values appearing under each column, which causes the source table to fail to convey valuable information. At this time, you can use the "perspective" method to adjust the layout of the source table for a clearer display.
There are two main methods of perspective in Pandas:
- Use the pivot function to turn "a long table" into "multiple wide tables"
- Use melt to turn "multiple wide tables" into "one long table"
piovt_table
The piovt() method only converts column data into row indexes and class indexes. The following first introduces the greatly improved operability of piovt_table(), which is both a top-level class function and an instance object function.
Let's introduce a few data first
pd.pivot_table(data, values=None,
index=None, columns=None,
aggfunc='mean', fill_value=None,
margins=False,dropna=True,
margins_name='All')
Parameter description:
example
import numpy as np
import pandas as pd
table = pd.pivot_table(df,index=["Manager","Status"],columns=["Product"],
values=["Quantity","Price"],aggfunc={
"Quantity":len,"Price":[np.sum,np.mean]},fill_value='w')
pivot
There are several parameters here
parameter | description |
---|---|
index |
Index name after reshaping |
columns |
The column names of the new table after the reshape |
values |
Generate new column values |
Create a few data first
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
'a':np.arange(5),
'b':np.arange(10,20,2),
'c':np.asarray(['A','B','C','D','E'])
})
df.pivot(index='c', columns='a',values='b')
This is basically the case. We can store the reshaped data and add it to a new table.
melt
Melt can be understood as the opposite method of pivot. The
parameters are as follows:
parameter | Description |
---|---|
id_vars |
The column names that do not need to be converted will be used as identifier columns after conversion |
value_vars |
Existing columns that need to be converted |
var_name |
Set a new column name composed of'value_vars' |
value_name |
Set a new column name composed of the data of'value_vars' |
col_level |
If the column is MultiIndex, use this level |
# 接着上面的数据
df.melt(id_vars="c",value_vars=["b","a"],var_name="列名",value_name="数据")
Reference address: