Pandas reshaping and perspective

The two operations of reshape and pivot only change the layout of the data table:
stack (column index—>row index) and unstack function (row index—>column index) for reshape (reverse each other) Operation)
pivot and melt functions for perspective (reverse operations for each other)

Reshape

Basic demo

Now let’s create some data

symbol = ['JD', 'AAPL']
data = {
    
    '行业': ['电商', '科技'],'价格': [25.95, 172.97],'交易量': [27113291, 18913154]}
df = pd.DataFrame( data, index=symbol )
df.columns.name = '特征'
df.index.name = '代号'

Results screenshot
Use stack(),unstack()

q = df.stack() 或者添加参数 df.stack('特征')
q = df.unstack() 或者添加参数 df.unstack('代号')

Screenshot of the effect.

Observe the output results above carefully. You can find that the stack method adds the column index of df to the row index of df to become a multi-level index (MultiIndex), and unstack adds the row index of df to the row index of df. Become a multi-level index (MultiIndex).

Multi-layer DataFrame

data = [ 
	['电商', 101550, 176.92], 
	['电商', 175336, 25.95], 
	['金融',60348, 41.79], 
	['金融', 36600, 196.00] ]
midx = pd.MultiIndex( levels=[['中国','美国'], ['BABA','JD', 'GS', 'MS']],
					  labels=[[0,0,1,1],[0,1,2,3]],
					  names = ['地区', '代号'])
mcol = pd.Index(['行业','雇员','价格'], name='特征')
df = pd.DataFrame( data, index=midx, columns=mcol )

data structure
From the above figure, we can see that the primary index of the row is the region [China, the United States], the secondary index is the code name [BAB, JD, GS, MS], and the column index is the characteristic [industry, employee, price]

Use unstack based on multi-level index (column -> row)

	df.unstack(0)
	df.unstack(1)

Run screenshot
Their row index and column index are as shown in the figure below. The column index becomes a multi-level index
Insert picture description here
and can be applied jointly

df.unstack(0).stack(1)

Results screenshot
Their row index and column index are:
Rank index
there are many situations that can be operated, and those who are interested can try more!

perspective

Perspective: A method or technique for depicting the spatial relationship of objects on a plane or curved surface. The data source table usually contains only rows and columns, so there are often duplicate values ​​appearing under each column, which causes the source table to fail to convey valuable information. At this time, you can use the "perspective" method to adjust the layout of the source table for a clearer display.
There are two main methods of perspective in Pandas:

  • Use the pivot function to turn "a long table" into "multiple wide tables"
  • Use melt to turn "multiple wide tables" into "one long table"

piovt_table

The piovt() method only converts column data into row indexes and class indexes. The following first introduces the greatly improved operability of piovt_table(), which is both a top-level class function and an instance object function.
Let's introduce a few data first
data source

pd.pivot_table(data, values=None, 
				index=None, columns=None, 
				aggfunc='mean', fill_value=None, 
				margins=False,dropna=True, 	
				margins_name='All')

Parameter description:
Parameter Description
example

import numpy as np
import pandas as pd
table = pd.pivot_table(df,index=["Manager","Status"],columns=["Product"],
values=["Quantity","Price"],aggfunc={
    
    "Quantity":len,"Price":[np.sum,np.mean]},fill_value='w')

result

pivot

There are several parameters here

parameter description
index Index name after reshaping
columns The column names of the new table after the reshape
values Generate new column values

Create a few data first

import pandas as pd
import numpy as np
df = pd.DataFrame(
    {
    
    'a':np.arange(5),
     'b':np.arange(10,20,2),
     'c':np.asarray(['A','B','C','D','E']) 
    }) 
	df.pivot(index='c', columns='a',values='b')

Results screenshot
This is basically the case. We can store the reshaped data and add it to a new table.

melt

Melt can be understood as the opposite method of pivot. The
parameters are as follows:

parameter Description
id_vars The column names that do not need to be converted will be used as identifier columns after conversion
value_vars Existing columns that need to be converted
var_name Set a new column name composed of'value_vars'
value_name Set a new column name composed of the data of'value_vars'
col_level If the column is MultiIndex, use this level
# 接着上面的数据
df.melt(id_vars="c",value_vars=["b","a"],var_name="列名",value_name="数据")

Results screenshot

Reference address:

Guess you like

Origin blog.csdn.net/qq_44091773/article/details/106113566