Pandas Dataframe操作技巧

转载自勤奋的小青蛙

1:原有列基础生成新列

常见使用情景:两列相减的值为新的一列,或者多列操作生成新的一列

技巧:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

import pandas as pd

# make a simple dataframe

df = pd.DataFrame({'a':[1,2], 'b':[3,4]})

df

#    a  b

# 0  1  3

# 1  2  4

# this just creates an unattached column:

df.apply(lambda row: row.a + row.b, axis=1)

# 0    4

# 1    6

# do same but attach it to the dataframe

df['c'] = df.apply(lambda row: row.a + row.b, axis=1)

df

#    a  b  c

# 0  1  3  4

# 1  2  4  6

详见:https://stackoverflow.com/questions/26886653/pandas-create-new-column-based-on-values-from-other-columns

2:如何遍历Pandas Dataframe每一行

1

2

3

4

5

6

7

8

9

10

11

   c1   c2

0  10  100

1  11  110

2  12  120

In [18]: for index, row in df.iterrows():

   ....:     print row['c1'], row['c2']

   ....:    

10 100

11 110

12 120

详见:https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas

3:Pandas Dataframe通过matplotlib画图如何鼠标滚轮缩放

主要缩放函数:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

import matplotlib.pyplot as plt

def zoom_factory(ax,base_scale = 2.):

    def zoom_fun(event):

        # get the current x and y limits

        cur_xlim = ax.get_xlim()

        cur_ylim = ax.get_ylim()

        cur_xrange = (cur_xlim[1] - cur_xlim[0])*.5

        cur_yrange = (cur_ylim[1] - cur_ylim[0])*.5

        xdata = event.xdata # get event x location

        ydata = event.ydata # get event y location

        if event.button == 'up':

            # deal with zoom in

            scale_factor = 1/base_scale

        elif event.button == 'down':

            # deal with zoom out

            scale_factor = base_scale

        else:

            # deal with something that should never happen

            scale_factor = 1

            print event.button

        # set new limits

        ax.set_xlim([xdata - cur_xrange*scale_factor,

                     xdata + cur_xrange*scale_factor])

        ax.set_ylim([ydata - cur_yrange*scale_factor,

                     ydata + cur_yrange*scale_factor])

        plt.draw() # force re-draw

    fig = ax.get_figure() # get the figure of interest

    # attach the call back

    fig.canvas.mpl_connect('scroll_event',zoom_fun)

    #return the function

    return zoom_fun

如何使用(直接对matplotlib的轴对象ax进行操作即可):

ax.plot(range(10))

scale = 1.5

f = zoom_factory(ax,base_scale = scale)

可选参数base_scale允许您将比例因子设置为您想要的值。

一定要确保缩放函数有个返回值对象f。所以如果你不保存f,该缩放返回值可能被垃圾回收。

演示:

mouse_zoom

详见:https://stackoverflow.com/questions/11551049/matplotlib-plot-zooming-with-scroll-wheel

4:pandas更改索引,更改index为某列

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

In [1]: import pandas as pd

In [2]: df = pd.read_csv('hello.csv')

In [3]: df

Out[3]:

    name  gender

0  Lucas    Male

1   Lucy  Female

2   Lily  Female

3    Jim    Male

In [4]: df.set_index('name')

Out[4]:

       gender

name

Lucas    Male

Lucy   Female

Lily   Female

Jim      Male

5:ndarray切片区域选取

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

In [1]: from numpy import *

In [2]: a = arange(36).reshape((6,6))

In [3]: a

Out[3]:

array([[ 0,  1,  2,  3,  4,  5],

       [ 6,  7,  8,  9, 10, 11],

       [12, 13, 14, 15, 16, 17],

       [18, 19, 20, 21, 22, 23],

       [24, 25, 26, 27, 28, 29],

       [30, 31, 32, 33, 34, 35]])

In [4]: a[1,2]

Out[4]: 8

In [5]: a[1,:]

Out[5]: array([ 6,  7,  8,  9, 10, 11])

In [6]: a[:,2]

Out[6]: array([ 2,  8, 14, 20, 26, 32])

In [7]: a[0:2, 0:2]

Out[7]:

array([[0, 1],

       [6, 7]])

语法:中括号中第一个选取就是行,第二个选取的就是列,中间用逗号隔开。

ndarray_iloc

6:新增一列是某列的累加

1

2

3

4

5

6

7

8

9

10

In [1]: import pandas as pd

In [2]: num_list = [1, 2, 3, 4]

In [3]: df = pd.DataFrame(data=num_list)

In [4]: df[1] = df.cumsum()

In [5]: df

Out[6]:

   0   1

0  1   1

1  2   3

2  3   6

猜你喜欢

转载自blog.csdn.net/xirongxu_dlut/article/details/81812034