如何更改DataFrame列的顺序?

本文翻译自:How to change the order of DataFrame columns?

I have the following DataFrame ( df ): 我有以下DataFramedf ):

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(10, 5))

I add more column(s) by assignment: 我通过分配添加了更多列:

df['mean'] = df.mean(1)

How can I move the column mean to the front, ie set it as first column leaving the order of the other columns untouched? 如何将栏mean移到最前面,即将其设置为第一栏,而其他栏的顺序保持不变?


#1楼

参考:https://stackoom.com/question/tAVR/如何更改DataFrame列的顺序


#2楼

One easy way would be to reassign the dataframe with a list of the columns, rearranged as needed. 一种简单的方法是使用列的列表重新分配数据框,并根据需要重新排列。

This is what you have now: 这是您现在拥有的:

In [6]: df
Out[6]:
          0         1         2         3         4      mean
0  0.445598  0.173835  0.343415  0.682252  0.582616  0.445543
1  0.881592  0.696942  0.702232  0.696724  0.373551  0.670208
2  0.662527  0.955193  0.131016  0.609548  0.804694  0.632596
3  0.260919  0.783467  0.593433  0.033426  0.512019  0.436653
4  0.131842  0.799367  0.182828  0.683330  0.019485  0.363371
5  0.498784  0.873495  0.383811  0.699289  0.480447  0.587165
6  0.388771  0.395757  0.745237  0.628406  0.784473  0.588529
7  0.147986  0.459451  0.310961  0.706435  0.100914  0.345149
8  0.394947  0.863494  0.585030  0.565944  0.356561  0.553195
9  0.689260  0.865243  0.136481  0.386582  0.730399  0.561593

In [7]: cols = df.columns.tolist()

In [8]: cols
Out[8]: [0L, 1L, 2L, 3L, 4L, 'mean']

Rearrange cols in any way you want. 重新排列cols在任何你想要的方式。 This is how I moved the last element to the first position: 这就是我将最后一个元素移到第一个位置的方式:

In [12]: cols = cols[-1:] + cols[:-1]

In [13]: cols
Out[13]: ['mean', 0L, 1L, 2L, 3L, 4L]

Then reorder the dataframe like this: 然后像这样重新排列数据框:

In [16]: df = df[cols]  #    OR    df = df.ix[:, cols]

In [17]: df
Out[17]:
       mean         0         1         2         3         4
0  0.445543  0.445598  0.173835  0.343415  0.682252  0.582616
1  0.670208  0.881592  0.696942  0.702232  0.696724  0.373551
2  0.632596  0.662527  0.955193  0.131016  0.609548  0.804694
3  0.436653  0.260919  0.783467  0.593433  0.033426  0.512019
4  0.363371  0.131842  0.799367  0.182828  0.683330  0.019485
5  0.587165  0.498784  0.873495  0.383811  0.699289  0.480447
6  0.588529  0.388771  0.395757  0.745237  0.628406  0.784473
7  0.345149  0.147986  0.459451  0.310961  0.706435  0.100914
8  0.553195  0.394947  0.863494  0.585030  0.565944  0.356561
9  0.561593  0.689260  0.865243  0.136481  0.386582  0.730399

#3楼

How about: 怎么样:

df.insert(0, 'mean', df.mean(1))

http://pandas.pydata.org/pandas-docs/stable/dsintro.html#column-selection-addition-deletion http://pandas.pydata.org/pandas-docs/stable/dsintro.html#column-selection-addition-deletion


#4楼

之前已经回答了这个问题但是reindex_axis现在已被弃用,所以我建议使用:

df.reindex(sorted(df.columns), axis=1)

#5楼

You could also do something like this: 您还可以执行以下操作:

df = df[['mean', '0', '1', '2', '3']]

You can get the list of columns with: 您可以通过以下方式获取列列表:

cols = list(df.columns.values)

The output will produce: 输出将产生:

['0', '1', '2', '3', 'mean']

...which is then easy to rearrange manually before dropping it into the first function ...然后轻松将其放到第一个功能中即可手动重新排列


#6楼

This function avoids you having to list out every variable in your dataset just to order a few of them. 此功能避免了仅列出一些变量就不必列出数据集中的每个变量。

def order(frame,var):
    if type(var) is str:
        var = [var] #let the command take a string or list
    varlist =[w for w in frame.columns if w not in var]
    frame = frame[var+varlist]
    return frame 

It takes two arguments, the first is the dataset, the second are the columns in the data set that you want to bring to the front. 它有两个参数,第一个是数据集,第二个是您要放在最前面的数据集中的列。

So in my case I have a data set called Frame with variables A1, A2, B1, B2, Total and Date. 因此,在我的情况下,我有一个名为Frame的数据集,其中包含变量A1,A2,B1,B2,总计和日期。 If I want to bring Total to the front then all I have to do is: 如果我想让道达尔走在前列,那么我要做的就是:

frame = order(frame,['Total'])

If I want to bring Total and Date to the front then I do: 如果我想将“总计”和“日期”放在首位,那么我会这样做:

frame = order(frame,['Total','Date'])

EDIT: 编辑:

Another useful way to use this is, if you have an unfamiliar table and you're looking with variables with a particular term in them, like VAR1, VAR2,... you may execute something like: 使用此功能的另一种有用方法是,如果您有一个陌生的表,并且正在查找其中包含特定术语的变量,例如VAR1,VAR2等,则可以执行以下操作:

frame = order(frame,[v for v in frame.columns if "VAR" in v])
发布了0 篇原创文章 · 获赞 75 · 访问量 56万+

猜你喜欢

转载自blog.csdn.net/w36680130/article/details/105467232