Pandas学习笔记4——变形

在这里插入图片描述
Task04:变形(2天)

Pandas基础

变形

import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# 加上这两行可以一次性输出多个变量而不用print
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
df = pd.read_csv('data/table.csv')
df.head()
School Class ID Gender Address Height Weight Math Physics
0 S_1 C_1 1101 M street_1 173 63 34.0 A+
1 S_1 C_1 1102 F street_2 192 73 32.5 B+
2 S_1 C_1 1103 M street_2 186 82 87.2 B+
3 S_1 C_1 1104 F street_2 167 81 80.4 B-
4 S_1 C_1 1105 F street_4 159 64 84.8 B+

一、透视表

1. pivot

一般状态下,数据在DataFrame会以压缩(stacked)状态存放,例如上面的Gender,两个类别被叠在一列中,pivot函数可将某一列作为新的cols:

df.pivot(index='ID',columns='Gender',values='Height').head()
Gender F M
ID
1101 NaN 173.0
1102 192.0 NaN
1103 NaN 186.0
1104 167.0 NaN
1105 159.0 NaN

然而pivot函数具有很强的局限性,除了功能上较少之外,还不允许values中出现重复的行列索引对(pair),例如下面的语句就会报错:

#df.pivot(index='School',columns='Gender',values='Height').head()

因此,更多的时候会选择使用强大的pivot_table函数

2. pivot_table

首先,再现上面的操作:

pd.pivot_table(df,index='ID',columns='Gender',values='Height').head()
Gender F M
ID
1101 NaN 173.0
1102 192.0 NaN
1103 NaN 186.0
1104 167.0 NaN
1105 159.0 NaN

由于功能更多,速度上自然是比不上原来的pivot函数:

%timeit df.pivot(index='ID',columns='Gender',values='Height')
%timeit pd.pivot_table(df,index='ID',columns='Gender',values='Height')
1.52 ms ± 3.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
6.51 ms ± 29.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Pandas中提供了各种选项,下面介绍常用参数:

① aggfunc:对组内进行聚合统计,可传入各类函数,默认为’mean’

res = pd.pivot_table(df,
               index='School',
               columns='Gender',
               values='Height',
               aggfunc=['mean', 'sum'])
res.head()
mean sum
Gender F M F M
School
S_1 173.125000 178.714286 1385 1251
S_2 173.727273 172.000000 1911 1548
res.columns
MultiIndex([('mean', 'F'),
            ('mean', 'M'),
            ( 'sum', 'F'),
            ( 'sum', 'M')],
           names=[None, 'Gender'])
# 多索引变单索引
res.columns = ['_'.join([i[1], i[0]]) for i in res.columns]
res
F_mean M_mean F_sum M_sum
School
S_1 173.125000 178.714286 1385 1251
S_2 173.727273 172.000000 1911 1548

② margins:汇总边际状态

pd.pivot_table(df,
               index='School',
               columns='Gender',
               values='Height',
               aggfunc=['mean', 'sum'],
               margins=True).head()
#margins_name可以设置名字,默认为'All'
mean sum
Gender F M All F M All
School
S_1 173.125000 178.714286 175.733333 1385 1251 2636
S_2 173.727273 172.000000 172.950000 1911 1548 3459
All 173.473684 174.937500 174.142857 3296 2799 6095

③ 行、列、值都可以为多级

pd.pivot_table(df,
               index=['School', 'Class'],
               columns=['Gender', 'Address'],
               values=['Height', 'Weight'])
Height Weight
Gender F M F M
Address street_1 street_2 street_4 street_5 street_6 street_7 street_1 street_2 street_4 street_5 street_6 street_7 street_1 street_2 street_4 street_5 street_6 street_7 street_1 street_2 street_4 street_5 street_6 street_7
School Class
S_1 C_1 NaN 179.5 159.0 NaN NaN NaN 173.0 186.0 NaN NaN NaN NaN NaN 77.0 64.0 NaN NaN NaN 63.0 82.0 NaN NaN NaN NaN
C_2 NaN NaN 176.0 162.0 167.0 NaN NaN NaN NaN 188.0 160.0 NaN NaN NaN 94.0 63.0 63.0 NaN NaN NaN NaN 68.0 53.0 NaN
C_3 175.0 NaN NaN 187.0 NaN NaN NaN 195.0 161.0 NaN NaN 188.0 57.0 NaN NaN 69.0 NaN NaN NaN 70.0 68.0 NaN NaN 82.0
S_2 C_1 NaN NaN NaN 159.0 161.0 NaN NaN NaN 163.5 NaN NaN 174.0 NaN NaN NaN 97.0 61.0 NaN NaN NaN 71.0 NaN NaN 84.0
C_2 NaN NaN NaN NaN NaN 188.5 175.0 NaN 155.0 193.0 NaN NaN NaN NaN NaN NaN NaN 76.5 74.0 NaN 91.0 100.0 NaN NaN
C_3 NaN NaN 157.0 NaN 164.0 190.0 NaN NaN 187.0 171.0 NaN NaN NaN NaN 78.0 NaN 81.0 99.0 NaN NaN 73.0 88.0 NaN NaN
C_4 NaN 176.0 NaN NaN 175.5 NaN NaN NaN NaN NaN NaN 166.0 NaN 73.0 NaN NaN 57.0 NaN NaN NaN NaN NaN NaN 82.0

3. crosstab(交叉表)

交叉表是一种特殊的透视表,典型的用途如分组统计,如现在想要统计关于街道和性别分组的频数:

pd.crosstab(index=df['Address'],columns=df['Gender'])
# 显示的是每个街道男女分的·频数
Gender F M
Address
street_1 1 2
street_2 4 2
street_4 3 5
street_5 3 3
street_6 5 1
street_7 3 3

交叉表的功能也很强大(但目前还不支持多级分组),下面说明一些重要参数:

① values和aggfunc:分组对某些数据进行聚合操作,这两个参数必须成对出现

pd.crosstab(index=df['Address'],
            columns=df['Gender'],
            values=np.random.randint(1, 20, df.shape[0]),
            aggfunc='min')
#默认参数等于如下方法:
#pd.crosstab(index=df['Address'],columns=df['Gender'],values=1,aggfunc='count')
Gender F M
Address
street_1 16 2
street_2 2 6
street_4 11 1
street_5 4 15
street_6 6 7
street_7 4 7

② 除了边际参数margins外,还引入了normalize参数,可选’all’,‘index’,'columns’参数值

# 加入了归一化参数
pd.crosstab(index=df['Address'],
            columns=df['Gender'],
            normalize='all',
            margins=True)
Gender F M All
Address
street_1 0.028571 0.057143 0.085714
street_2 0.114286 0.057143 0.171429
street_4 0.085714 0.142857 0.228571
street_5 0.085714 0.085714 0.171429
street_6 0.142857 0.028571 0.171429
street_7 0.085714 0.085714 0.171429
All 0.542857 0.457143 1.000000

二、其他变形方法

1. melt

melt函数可以认为是pivot函数的逆操作,将unstacked状态的数据,压缩成stacked,使“宽”的DataFrame变“窄”

df_m = df[['ID','Gender','Math']]
df_m.head()
ID Gender Math
0 1101 M 34.0
1 1102 F 32.5
2 1103 M 87.2
3 1104 F 80.4
4 1105 F 84.8
df.pivot(index='ID',columns='Gender',values='Math').head()
Gender F M
ID
1101 NaN 34.0
1102 32.5 NaN
1103 NaN 87.2
1104 80.4 NaN
1105 84.8 NaN

melt函数中的id_vars表示需要保留的列,value_vars表示需要stack的一组列

pivoted = df.pivot(index='ID', columns='Gender', values='Math')
result = pivoted.reset_index().melt(id_vars=['ID'],value_vars=['F','M'],value_name='Math')\
                     .dropna().set_index('ID').sort_index() # dropna是去空
# 检验是否与展开前的df相同,可以分别将这些链式方法的中间步骤展开,看看是什么结果
result.head()
Gender Math
ID
1101 M 34.0
1102 F 32.5
1103 M 87.2
1104 F 80.4
1105 F 84.8

2. 压缩与展开

(1)stack:这是最基础的变形函数,总共只有两个参数:level和dropna

df_s = pd.pivot_table(df,
                      index=['Class', 'ID'],
                      columns='Gender',
                      values=['Height', 'Weight'])
# df_s
df_s.groupby('Class').head()
Height Weight
Gender F M F M
Class ID
C_1 1101 NaN 173.0 NaN 63.0
1102 192.0 NaN 73.0 NaN
1103 NaN 186.0 NaN 82.0
1104 167.0 NaN 81.0 NaN
1105 159.0 NaN 64.0 NaN
C_2 1201 NaN 188.0 NaN 68.0
1202 176.0 NaN 94.0 NaN
1203 NaN 160.0 NaN 53.0
1204 162.0 NaN 63.0 NaN
1205 167.0 NaN 63.0 NaN
C_3 1301 NaN 161.0 NaN 68.0
1302 175.0 NaN 57.0 NaN
1303 NaN 188.0 NaN 82.0
1304 NaN 195.0 NaN 70.0
1305 187.0 NaN 69.0 NaN
C_4 2401 192.0 NaN 62.0 NaN
2402 NaN 166.0 NaN 82.0
2403 158.0 NaN 60.0 NaN
2404 160.0 NaN 84.0 NaN
2405 193.0 NaN 54.0 NaN
df_stacked = df_s.stack()
df_stacked.groupby('Class').head(2)
Height Weight
Class ID Gender
C_1 1101 M 173.0 63.0
1102 F 192.0 73.0
C_2 1201 M 188.0 68.0
1202 F 176.0 94.0
C_3 1301 M 161.0 68.0
1302 F 175.0 57.0
C_4 2401 F 192.0 62.0
2402 M 166.0 82.0

stack函数可以看做将横向的索引放到纵向,因此功能类似与melt,参数level可指定变化的列索引是哪一层(或哪几层,需要列表)

df_stacked = df_s.stack(0)
df_stacked.groupby('Class').head(2)
Gender F M
Class ID
C_1 1101 Height NaN 173.0
Weight NaN 63.0
C_2 1201 Height NaN 188.0
Weight NaN 68.0
C_3 1301 Height NaN 161.0
Weight NaN 68.0
C_4 2401 Height 192.0 NaN
Weight 62.0 NaN

(2) unstack:stack的逆函数,功能上类似于pivot_table

df_stacked.head()
Gender F M
Class ID
C_1 1101 Height NaN 173.0
Weight NaN 63.0
1102 Height 192.0 NaN
Weight 73.0 NaN
1103 Height NaN 186.0
result = df_stacked.unstack().swaplevel(1,0,axis=1).sort_index(axis=1)
# result.equals(df_s)
#同样在unstack中可以指定level参数
result
Height Weight
Gender F M F M
Class ID
C_1 1101 NaN 173.0 NaN 63.0
1102 192.0 NaN 73.0 NaN
1103 NaN 186.0 NaN 82.0
1104 167.0 NaN 81.0 NaN
1105 159.0 NaN 64.0 NaN
2101 NaN 174.0 NaN 84.0
2102 161.0 NaN 61.0 NaN
2103 NaN 157.0 NaN 61.0
2104 159.0 NaN 97.0 NaN
2105 NaN 170.0 NaN 81.0
C_2 1201 NaN 188.0 NaN 68.0
1202 176.0 NaN 94.0 NaN
1203 NaN 160.0 NaN 53.0
1204 162.0 NaN 63.0 NaN
1205 167.0 NaN 63.0 NaN
2201 NaN 193.0 NaN 100.0
2202 194.0 NaN 77.0 NaN
2203 NaN 155.0 NaN 91.0
2204 NaN 175.0 NaN 74.0
2205 183.0 NaN 76.0 NaN
C_3 1301 NaN 161.0 NaN 68.0
1302 175.0 NaN 57.0 NaN
1303 NaN 188.0 NaN 82.0
1304 NaN 195.0 NaN 70.0
1305 187.0 NaN 69.0 NaN
2301 157.0 NaN 78.0 NaN
2302 NaN 171.0 NaN 88.0
2303 190.0 NaN 99.0 NaN
2304 164.0 NaN 81.0 NaN
2305 NaN 187.0 NaN 73.0
C_4 2401 192.0 NaN 62.0 NaN
2402 NaN 166.0 NaN 82.0
2403 158.0 NaN 60.0 NaN
2404 160.0 NaN 84.0 NaN
2405 193.0 NaN 54.0 NaN

三、哑变量与因子化

1. Dummy Variable(哑变量)

这里主要介绍get_dummies函数,其功能主要是进行one-hot编码:

pd.get_dummies()就是讲一种字符型或者其他类型编程成一串数字向量,也就是所谓的one-hot编码

df_d = df[['Class','Gender','Weight']]
df_d.head()
Class Gender Weight
0 C_1 M 63
1 C_1 F 73
2 C_1 M 82
3 C_1 F 81
4 C_1 F 64

现在希望将上面的表格前两列转化为哑变量,并加入第三列Weight数值:

pd.get_dummies(df_d[['Class', 'Gender']]).join(df_d['Weight']).head()
#可选prefix参数添加前缀,prefix_sep添加分隔符
Class_C_1 Class_C_2 Class_C_3 Class_C_4 Gender_F Gender_M Weight
0 1 0 0 0 0 1 63
1 1 0 0 0 1 0 73
2 1 0 0 0 0 1 82
3 1 0 0 0 1 0 81
4 1 0 0 0 1 0 64

2. factorize方法

该方法主要用于自然数编码,并且缺失值会被记做-1,其中sort参数表示是否排序后赋值

factorize英文意思:分解,分解为因数,因式分解的意思。
当一个feature有10几种甚至几百种取值可能的时候,使用get_summies是一件很恐怖的事情。因为:
例如feature1取值有[“a”,“b”,…“z”]26中可能的话,那么显然用哑变量就会把这一个feature变为26列。显然增加了数据的稀疏性,并且没有意义(视情况而定)。这时候或许就可以用pd.factorize()了。他会把他们分解为0-25种数字。同时1列字符串经过变换后仍是一列。就不会使得数据集看上去很臃肿了。

codes, uniques = pd.factorize(['b', None, 'a', 'c', 'b'], sort=True)
display(codes)
display(uniques)
# 转换后是一个元组,(自然数编码,原始无重复数据)
array([ 1, -1,  0,  2,  1], dtype=int64)



array(['a', 'b', 'c'], dtype=object)

四、问题与练习

1. 问题

【问题一】 上面提到了许多变形函数,如melt/crosstab/pivot/pivot_table/stack/unstack函数,请总结它们各自的使用特点。

【问题二】 变形函数和多级索引是什么关系?哪些变形函数会使得索引维数变化?具体如何变化?

【问题三】 请举出一个除了上文提过的关于哑变量方法的例子。

【问题四】 使用完stack后立即使用unstack一定能保证变化结果与原始表完全一致吗?

【问题五】 透视表中涉及了三个函数,请分别使用它们完成相同的目标(任务自定)并比较哪个速度最快。

【问题六】 既然melt起到了stack的功能,为什么再设计stack函数?

2. 练习

【练习一】 继续使用上一章的药物数据集:

df_ = pd.read_csv('data/Drugs.csv')
df_.head()

df = pd.read_csv('data/Drugs.csv',
                 index_col=['State', 'COUNTY', 'SubstanceName'])
df.head()
YYYY State COUNTY SubstanceName DrugReports
0 2010 VA ACCOMACK Propoxyphene 1
1 2010 OH ADAMS Morphine 9
2 2010 PA ADAMS Methadone 2
3 2010 VA ALEXANDRIA CITY Heroin 5
4 2010 PA ALLEGHENY Hydromorphone 5
YYYY DrugReports
State COUNTY SubstanceName
VA ACCOMACK Propoxyphene 2010 1
OH ADAMS Morphine 2010 9
PA ADAMS Methadone 2010 2
VA ALEXANDRIA CITY Heroin 2010 5
PA ALLEGHENY Hydromorphone 2010 5

(a) 现在请你将数据表转化成如下形态,每行需要显示每种药物在每个地区的10年至17年的变化情况,且前三列需要排序:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-hvjuCcxO-1588066459868)(picture/drug_pic.png)]

# 先进行排序
df = df.sort_index()
df.head()
YYYY DrugReports
State COUNTY SubstanceName
KY ADAIR Buprenorphine 2011 3
Buprenorphine 2012 5
Buprenorphine 2013 4
Buprenorphine 2014 27
Buprenorphine 2015 5
df_pivot = pd.pivot_table(df,
               index=['State', 'COUNTY', 'SubstanceName'],
               columns='YYYY',
               values='DrugReports').reset_index()
df_pivot.head()
YYYY State COUNTY SubstanceName 2010 2011 2012 2013 2014 2015 2016 2017
0 KY ADAIR Buprenorphine NaN 3.0 5.0 4.0 27.0 5.0 7.0 10.0
1 KY ADAIR Codeine NaN NaN 1.0 NaN NaN NaN NaN 1.0
2 KY ADAIR Fentanyl NaN NaN 1.0 NaN NaN NaN NaN NaN
3 KY ADAIR Heroin NaN NaN 1.0 2.0 NaN 1.0 NaN 2.0
4 KY ADAIR Hydrocodone 6.0 9.0 10.0 10.0 9.0 7.0 11.0 3.0
df_pivot = df_pivot.fillna('-')
df_pivot.head()
YYYY State COUNTY SubstanceName 2010 2011 2012 2013 2014 2015 2016 2017
0 KY ADAIR Buprenorphine - 3 5 4 27 5 7 10
1 KY ADAIR Codeine - - 1 - - - - 1
2 KY ADAIR Fentanyl - - 1 - - - - -
3 KY ADAIR Heroin - - 1 2 - 1 - 2
4 KY ADAIR Hydrocodone 6 9 10 10 9 7 11 3
df_pivot.rename_axis(columns={'YYYY': ''}).head()
State COUNTY SubstanceName 2010 2011 2012 2013 2014 2015 2016 2017
0 KY ADAIR Buprenorphine - 3 5 4 27 5 7 10
1 KY ADAIR Codeine - - 1 - - - - 1
2 KY ADAIR Fentanyl - - 1 - - - - -
3 KY ADAIR Heroin - - 1 2 - 1 - 2
4 KY ADAIR Hydrocodone 6 9 10 10 9 7 11 3

(b) 现在请将(a)中的结果恢复到原数据表,并通过equal函数检验初始表与新的结果是否一致(返回True)

df_melt = df_pivot.reset_index().melt(
    id_vars=['State', 'COUNTY', 'SubstanceName'],
    value_vars=df_pivot.columns[3:],
    value_name='DrugReports')
df_melt.head()
State COUNTY SubstanceName YYYY DrugReports
0 KY ADAIR Buprenorphine 2010 -
1 KY ADAIR Codeine 2010 -
2 KY ADAIR Fentanyl 2010 -
3 KY ADAIR Heroin 2010 -
4 KY ADAIR Hydrocodone 2010 6
df_.head()
df_melt = df_melt.replace('-', np.nan).dropna().set_index(np.arange(len(df)))
df_melt.head()
YYYY State COUNTY SubstanceName DrugReports
0 2010 VA ACCOMACK Propoxyphene 1
1 2010 OH ADAMS Morphine 9
2 2010 PA ADAMS Methadone 2
3 2010 VA ALEXANDRIA CITY Heroin 5
4 2010 PA ALLEGHENY Hydromorphone 5
State COUNTY SubstanceName YYYY DrugReports
0 KY ADAIR Hydrocodone 2010 6.0
1 KY ADAIR Methadone 2010 1.0
2 KY ALLEN Hydrocodone 2010 10.0
3 KY ALLEN Methadone 2010 4.0
4 KY ALLEN Oxycodone 2010 15.0
df_melt = df_melt[list(df_melt.columns)].astype({'DrugReports': 'int'})
df_melt.head()
State COUNTY SubstanceName YYYY DrugReports
0 KY ADAIR Hydrocodone 2010 6
1 KY ADAIR Methadone 2010 1
2 KY ALLEN Hydrocodone 2010 10
3 KY ALLEN Methadone 2010 4
4 KY ALLEN Oxycodone 2010 15

【练习二】 现有一份关于某地区地震情况的数据集,请解决如下问题:

df = pd.read_csv('data/Earthquake.csv')
df.head()
日期 时间 维度 经度 方向 距离 深度 烈度
0 2003.05.20 12:17:44 AM 39.04 40.38 west 0.1 10.0 0.0
1 2007.08.01 12:03:08 AM 40.79 30.09 west 0.1 5.2 4.0
2 1978.05.07 12:41:37 AM 38.58 27.61 south_west 0.1 0.0 0.0
3 1997.03.22 12:31:45 AM 39.47 36.44 south_west 0.1 10.0 0.0
4 2000.04.02 12:57:38 AM 40.80 30.24 south_west 0.1 7.0 0.0

(a) 现在请你将数据表转化成如下形态,将方向列展开,并将距离、深度和烈度三个属性压缩:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-TnmiEnFH-1588066459870)(picture/earthquake_pic.png)]

df_ = pd.pivot_table(df,
               index=['日期', '时间', '维度', '经度'],
               columns='方向',
               values=['距离', '深度', '烈度'])
df_.head()
深度 烈度 距离
方向 east north north_east north_west south south_east south_west west east north north_east north_west south south_east south_west west east north north_east north_west south south_east south_west west
日期 时间 维度 经度
1912.08.09 12:29:00 AM 40.6 27.2 NaN NaN NaN NaN NaN 16.0 NaN NaN NaN NaN NaN NaN NaN 6.7 NaN NaN NaN NaN NaN NaN NaN 4.3 NaN NaN
1912.08.10 12:23:00 AM 40.6 27.1 NaN NaN NaN NaN NaN NaN 15.0 NaN NaN NaN NaN NaN NaN NaN 6.0 NaN NaN NaN NaN NaN NaN NaN 2.0 NaN
12:30:00 AM 40.6 27.1 NaN NaN NaN NaN NaN NaN 15.0 NaN NaN NaN NaN NaN NaN NaN 5.2 NaN NaN NaN NaN NaN NaN NaN 2.0 NaN
1912.08.11 12:19:04 AM 40.6 27.2 NaN NaN NaN NaN NaN 30.0 NaN NaN NaN NaN NaN NaN NaN 4.9 NaN NaN NaN NaN NaN NaN NaN 4.3 NaN NaN
12:20:00 AM 40.6 27.1 NaN NaN NaN NaN NaN NaN 15.0 NaN NaN NaN NaN NaN NaN NaN 4.5 NaN NaN NaN NaN NaN NaN NaN 2.0 NaN
df_ = df_.stack(level=0).fillna('-')
df_.head()
方向 east north north_east north_west south south_east south_west west
日期 时间 维度 经度
1912.08.09 12:29:00 AM 40.6 27.2 深度 - - - - - 16 - -
烈度 - - - - - 6.7 - -
距离 - - - - - 4.3 - -
1912.08.10 12:23:00 AM 40.6 27.1 深度 - - - - - - 15 -
烈度 - - - - - - 6 -
df_.index.names
FrozenList(['日期', '时间', '维度', '经度', None])
df_.index.names = ['日期', '时间', '维度', '经度', '地震参数']
df_.head()
方向 east north north_east north_west south south_east south_west west
日期 时间 维度 经度 地震参数
1912.08.09 12:29:00 AM 40.6 27.2 深度 - - - - - 16 - -
烈度 - - - - - 6.7 - -
距离 - - - - - 4.3 - -
1912.08.10 12:23:00 AM 40.6 27.1 深度 - - - - - - 15 -
烈度 - - - - - - 6 -

(b) 现在请将(a)中的结果恢复到原数据表,并通过equal函数检验初始表与新的结果是否一致(返回True)

df_unstack = df_.unstack().stack(0).reset_index().replace('-', np.nan).dropna().sort_values(by=['日期', '时间', '维度', '经度'])\
                .reset_index().rename_axis(columns={'地震参数':''}).drop(columns='index')
df_unstack.head()
日期 时间 维度 经度 方向 深度 烈度 距离
0 1912.08.09 12:29:00 AM 40.6 27.2 south_east 16.0 6.7 4.3
1 1912.08.10 12:23:00 AM 40.6 27.1 south_west 15.0 6.0 2.0
2 1912.08.10 12:30:00 AM 40.6 27.1 south_west 15.0 5.2 2.0
3 1912.08.11 12:19:04 AM 40.6 27.2 south_east 30.0 4.9 4.3
4 1912.08.11 12:20:00 AM 40.6 27.1 south_west 15.0 4.5 2.0
cols = list(df_unstack.columns)
a, b, c = cols.index('深度'), cols.index('烈度'), cols.index('距离')
cols[a], cols[b], cols[c] = cols[c], cols[a], cols[b]
df_unstack = df_unstack[cols]
df_unstack.head()
日期 时间 维度 经度 方向 距离 深度 烈度
0 1912.08.09 12:29:00 AM 40.6 27.2 south_east 4.3 16.0 6.7
1 1912.08.10 12:23:00 AM 40.6 27.1 south_west 2.0 15.0 6.0
2 1912.08.10 12:30:00 AM 40.6 27.1 south_west 2.0 15.0 5.2
3 1912.08.11 12:19:04 AM 40.6 27.2 south_east 4.3 30.0 4.9
4 1912.08.11 12:20:00 AM 40.6 27.1 south_west 2.0 15.0 4.5

参考内容

  1. 教程仓库连接
  2. 《利用Python进行数据分析》

猜你喜欢

转载自blog.csdn.net/qq_42662568/article/details/105819507