The basic operation of indexing and DataFrame

A, DataFrame index

1, select the column

 1 import pandas as pd
 2 import numpy as np
 3 from pandas import Series, DataFrame
 4 
 5 df = DataFrame(np.random.rand(12).reshape((3,4)),
 6               index = ['one', 'two', 'three'],
 7               columns= list('abcd')
 8               )
 9 print(df)
10 type(df['a'])
11 df[['a','c']]  # dataframe

 

 

 NOTE: df [] - selecting the column, an integer may be selected row, but not individually selected, use of means such as slicing df [: 2]

2. Select row

= DataFrame DF1 (np.random.rand (12 is) .reshape ((3,4- )), 
                index = [3,2,1 ], 
              Columns = List ( ' ABCD ' ) 
              ) 
Print (DF1) 
df.loc [ ' One ' ]    # separate line, is the return of a Series object 
# df1.loc [0] # integer index, only the default index is an integer 
df.loc [[ ' One ' , ' Three ' , ' Four ' ]]   # multiple rows, is the return of a target Dataframe 
df1.loc [2:. 1 ]
 #df.loc [ 'two': 'three '] # index label sections closed interval 
# df.loc [label] is mainly aimed row index, noting that the specified index, and the default index

 

 

 

 

 

 Note: df.loc [] - select the row for the index, the index string label

3, another method of indexing

df.iloc [] according integer position (from 0 to length-1) selected row
is similar to an index list, the order is dataframe integer position, the operator starts from 0

. 1  Print (DF)
 2 df.iloc [0]   # line 
. 3  # df.iloc [. 3] is not beyond the scope of the index 
. 4 df.iloc [[0,1]]   # multiple rows are selected, the object returns dataframe 
. 5  
. 6 DF. iLoc [[1,0]] # multiple selected rows are variable 
. 7 df.iloc [2 ::]

4, Boolean index

The same principle and the principle of Series

. 1  # df = df / 10000 
2  Print (df)
 . 3 B1 = df <20 is     # return one and the same df shape Boolean dataframe 
. 4 df [B1]   # return a dataframe, all the data, True return to the original data, False returns NaN 
. 5  
. 6 B2 = DF [ ' a ' ]> 50   # separate, is Series 
. 7  # Print (B2, type (B2)) 
. 8 DF [B2]   # separate to make a judgment, retention determines line data to True 
. 9  
10  # plurality of rows to make a judgment 
. 11 B3 = DF [[ ' A ' , ' B ' ]]> 50   #  Returns a Boolean dataframe 
12 is  Print (B3)
 13 is  Print (DF [B3])   # returns the same shape dataframe return position data to True, False, and other locations are returned NaN

5, multi-index, while the index column and row

1 print(df)
2 df['a'].loc[['one','three']]
3 df[['b', 'c', 'd']].iloc[::2]
4 df[df['a']<50].iloc[0]

Second, the basic operation of DataFrame

1, View and transpose

1 df = DataFrame(np.random.rand(16).reshape((4,4)))
2 df.head(2)
3 df.tail()
4 # .T
5 print(df)
6 print(df.T)

 

 

 2, add and modify

. 1 df.columns = List ( ' ABCD ' )
 2 DF [ ' E ' ] = 10    #   Specify an added, scalars repeated 
. 3 DF

1  # Add a line, df supra data 
2 df.loc [. 4] =. 5
 . 3  Print (DF)

1  # modify, df supra data 
2 DF [ ' E ' ] = 0
 . 3  Print (DF)

1  # modify multiple columns, df supra data 
2 DF [[ ' D ' , ' E ' ]] = 88
 . 3  Print (DF)
 . 4  # Direct Assignment Index

 

 

 3, delete

1  # delete a data supra del df 
2  del DF [ ' E ' ]
 . 3  Print (DF)

1  # The second method to remove 
2 df.drop (0)    # drop default will return datafarme after deletion

. 1  # drop () Delete Axis = 0 axis = 1 row delete column 
2 df.drop ( ' A ' , axis = 1)

. 1  # drop returns a new default value, modify the original set of data inplace = True 
2 df.drop (0, inplace = True) 
. 3 DF

4, alignment

. 1 DF = DataFrame (np.arange (16) .reshape ((4,4 &)), Columns = List ( ' ABCD ' ))
 2 DF1 = DataFrame (np.arange (. 9) .reshape ((3,3)) , columns = List ( ' CBA ' ))
 . 3  Print (DF)
 . 4  Print (DF1)
 . 5  # automatically aligned in the row and column labels 
. 6 DF + DF1

 

 

 

 

 

 5, sorting

. 1 DF = DataFrame (np.random.randint (16, size = [4,4 &]), Columns = List ( ' ABCD ' ))
 2  DF
 . 3  # Press sorted values 
. 4 df.sort_values ( ' B ' , = False Ascending )   # sorted by the tag value equal to the column of the row of b, default ascending default ascending = True, False descending 
. 5  
. 6 df.sort_values ([ ' b ' , ' C ' ])   # joint

 

 

  

1  # default axis = 0, the value is to use the column to sort the rows, so the first parameter you need to pass a column index 
2  # Axis = 1, is to use the value of the line, to sort columns, incoming line index 
. 3 df.sort_values (2, Axis =. 1)

 

 

1  # Sort index 
2 df.index = [5,2,4,1 ]
 . 3 df.columns = List ( ' ADCE ' )
 . 4  Print (DF)
 . 5  
. 6 df.sort_index (Ascending = False)   # default row index descending descending ascending = False 
. 7  # Axis sorted. 1 = column index 
. 8 df.sort_index (Axis =. 1)

 

 

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/gt92/p/11803711.html