Data Analysis Chapter 3 Introduction to pandas (3)

1. Arithmetic operations and data alignment

frame1 = DataFrame( np.arange(9).reshape((3,3)), index=['yz', 'nj', 'bj'], columns=list('bcd') )
frame2 = DataFrame( np.arange(12).reshape((4, 3)), index=['sz','yz', 'nj', 'wx'], columns=list('bde') )
print(frame1 + frame2)
      b c d e
bj NaN NaN NaN NaN
nj 9.0 NaN 12.0 NaN
sz NaN NaN NaN NaN
wx NaN NaN NaN NaN NaN

yz 3.0 NaN 6.0 NaN

To fill NaN, you need to use the add method:

frame1 = DataFrame( np.arange(12).reshape((3, 4)),  columns=list('abcd') )
frame2 = DataFrame( np.arange(20).reshape((4, 5)),  columns=list('abcde') )
frame3 = frame1.add(frame2, fill_value=0)
print(frame3)
      a     b     c     d     e
0   0.0   2.0   4.0   6.0   4.0
1   9.0  11.0  13.0  15.0   9.0
2  18.0  20.0  22.0  24.0  14.0

3  15.0  16.0  17.0  18.0  19.0

Padding values ​​can also be added when reindexing:

frame1 = DataFrame( np.arange(12).reshape((3, 4)),  columns=list('abcd') )
frame2 = DataFrame( np.arange(20).reshape((4, 5)),  columns=list('abcde') )
frame3 = frame1.reindex( columns=frame2.columns, fill_value=0 )
print(frame3)
   a  b   c   d  e
0  0  1   2   3  0
1  4  5   6   7  0

2  8  9  10  11  0

Operations on DataFrame and Series (broadcasting):

frame1 = DataFrame( np.arange(12).reshape((4,3)),  columns=list('bde') )
obj1 = Series( np.arange(3), index=list('bde') )
print(frame1)
print(obj1)
print(frame1 - obj1)
   b   d   e
0  0   1   2
1  3   4   5
2  6   7   8
3  9  10  11


b    0

d    1
e    2
dtype: int32


   b  d  e

0  0  0  0
1  3  3  3
2  6  6  6

3  9  9  9

To broadcast on the column, axis=0 must be passed in, and the default axis=1 for DataFrame

frame1 = DataFrame( np.arange(12).reshape((4,3)),  columns=list('bde') )
obj2 = frame1['b']
print(frame1)
print(obj2)
frame2 = frame1.sub( obj2, axis=0 )
print(frame2)
   b   d   e
0  0   1   2
1  3   4   5
2  6   7   8
3  9  10  11


0    0

1    3
2    6
3    9

Name: b, dtype: int32


   b  d  e

0  0  1  2
1  0  1  2
2  0  1  2

3  0  1  2

If you do not pass in axis=0, then perform: on the column, and the result will become (DataFrame and Series operations cannot set fill_value):

    b d e 0 1 2 3
0 NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN
3

2. Function application and mapping

frame = DataFrame( np.random.randn(4,3), columns=list('bde'), index=['yz', 'nj', 'sh', 'bj'] )
print(frame)
frame1 = np.abs(frame)
print(frame1)

           b         d         e
yz -0.542548  0.770910 -0.774451
nj -1.281281 -0.103413  0.786930
sh  0.668737 -0.546546  0.487472
bj -1.598520  1.005432 -0.035987
           b         d         e
yz  0.542548  0.770910  0.774451
nj  1.281281  0.103413  0.786930
sh  0.668737  0.546546  0.487472

bj  1.598520  1.005432  0.035987

There is also an operation that applies a function to a row or column:

f = lambda x: x.max() - x.min()
obj1 = frame.apply(f) #default axis=0, 0 is the difference between the maximum and minimum elements of each column
obj2 = frame.apply(f, axis=1) #1 is the difference between the maximum and minimum elements of each row
print(obj1)
print(obj2)
b    2.267257
d    1.551978
e    1.561382
dtype: float64
yz    1.545361
nj    2.068211
sh    1.215283
bj    2.603953

dtype: float64

3. Sorting and ranking

Series:

obj = Series( np.arange(4), index=['d', 'a', 'b', 'c'] )
obj1 = obj.sort_index()
print(obj1)
a    1
b    2
c    3

d    0

DataFrame can be sorted on either axis:

frame = DataFrame( np.arange(8).reshape((2,4)), columns=list('dabc'), index=['three', 'one'] )
frame1 = frame.sort_index() #sort by row
frame2 = frame.sort_index(axis=1) #sort by column
print(frame1)
print(frame2)
          d  a  b  c
one    4  5  6  7
three  0  1  2  3
          a  b  c  d
three  1  2  3  0

one    5  6  7  4

The data is in ascending order by default, and can also be in descending order:

frame3 = frame.sort_index(axis=1, ascending=False)
print(frame3)
          d  c  b  a
three  0  3  2  1

one    4  7  6  5

Sort Series by value (rather than index):

obj = Series( [0,2,-5,3] )
obj1 = obj.sort_values()
print(obj1)
2   -5
0    0
1    2
3    3

dtype: int64

If there are NaNs in the series, the missing values ​​are placed at the end of the series by default

In a DataFrame you can sort by the values ​​in one or more columns:





































Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324641631&siteId=291194637