A, DataFrame index
1, select the column
1 import pandas as pd 2 import numpy as np 3 from pandas import Series, DataFrame 4 5 df = DataFrame(np.random.rand(12).reshape((3,4)), 6 index = ['one', 'two', 'three'], 7 columns= list('abcd') 8 ) 9 print(df) 10 type(df['a']) 11 df[['a','c']] # dataframe
NOTE: df [] - selecting the column, an integer may be selected row, but not individually selected, use of means such as slicing df [: 2]
2. Select row
= DataFrame DF1 (np.random.rand (12 is) .reshape ((3,4- )), index = [3,2,1 ], Columns = List ( ' ABCD ' ) ) Print (DF1) df.loc [ ' One ' ] # separate line, is the return of a Series object # df1.loc [0] # integer index, only the default index is an integer df.loc [[ ' One ' , ' Three ' , ' Four ' ]] # multiple rows, is the return of a target Dataframe df1.loc [2:. 1 ] #df.loc [ 'two': 'three '] # index label sections closed interval # df.loc [label] is mainly aimed row index, noting that the specified index, and the default index
Note: df.loc [] - select the row for the index, the index string label
3, another method of indexing
df.iloc [] according integer position (from 0 to length-1) selected row
is similar to an index list, the order is dataframe integer position, the operator starts from 0
. 1 Print (DF) 2 df.iloc [0] # line . 3 # df.iloc [. 3] is not beyond the scope of the index . 4 df.iloc [[0,1]] # multiple rows are selected, the object returns dataframe . 5 . 6 DF. iLoc [[1,0]] # multiple selected rows are variable . 7 df.iloc [2 ::]
4, Boolean index
The same principle and the principle of Series
. 1 # df = df / 10000 2 Print (df) . 3 B1 = df <20 is # return one and the same df shape Boolean dataframe . 4 df [B1] # return a dataframe, all the data, True return to the original data, False returns NaN . 5 . 6 B2 = DF [ ' a ' ]> 50 # separate, is Series . 7 # Print (B2, type (B2)) . 8 DF [B2] # separate to make a judgment, retention determines line data to True . 9 10 # plurality of rows to make a judgment . 11 B3 = DF [[ ' A ' , ' B ' ]]> 50 # Returns a Boolean dataframe 12 is Print (B3) 13 is Print (DF [B3]) # returns the same shape dataframe return position data to True, False, and other locations are returned NaN
5, multi-index, while the index column and row
1 print(df) 2 df['a'].loc[['one','three']] 3 df[['b', 'c', 'd']].iloc[::2] 4 df[df['a']<50].iloc[0]
Second, the basic operation of DataFrame
1, View and transpose
1 df = DataFrame(np.random.rand(16).reshape((4,4))) 2 df.head(2) 3 df.tail() 4 # .T 5 print(df) 6 print(df.T)
2, add and modify
. 1 df.columns = List ( ' ABCD ' ) 2 DF [ ' E ' ] = 10 # Specify an added, scalars repeated . 3 DF
1 # Add a line, df supra data 2 df.loc [. 4] =. 5 . 3 Print (DF)
1 # modify, df supra data 2 DF [ ' E ' ] = 0 . 3 Print (DF)
1 # modify multiple columns, df supra data 2 DF [[ ' D ' , ' E ' ]] = 88 . 3 Print (DF) . 4 # Direct Assignment Index
3, delete
1 # delete a data supra del df 2 del DF [ ' E ' ] . 3 Print (DF)
1 # The second method to remove 2 df.drop (0) # drop default will return datafarme after deletion
. 1 # drop () Delete Axis = 0 axis = 1 row delete column 2 df.drop ( ' A ' , axis = 1)
. 1 # drop returns a new default value, modify the original set of data inplace = True 2 df.drop (0, inplace = True)
. 3 DF
4, alignment
. 1 DF = DataFrame (np.arange (16) .reshape ((4,4 &)), Columns = List ( ' ABCD ' )) 2 DF1 = DataFrame (np.arange (. 9) .reshape ((3,3)) , columns = List ( ' CBA ' )) . 3 Print (DF) . 4 Print (DF1) . 5 # automatically aligned in the row and column labels . 6 DF + DF1
5, sorting
. 1 DF = DataFrame (np.random.randint (16, size = [4,4 &]), Columns = List ( ' ABCD ' )) 2 DF . 3 # Press sorted values . 4 df.sort_values ( ' B ' , = False Ascending ) # sorted by the tag value equal to the column of the row of b, default ascending default ascending = True, False descending . 5 . 6 df.sort_values ([ ' b ' , ' C ' ]) # joint
1 # default axis = 0, the value is to use the column to sort the rows, so the first parameter you need to pass a column index 2 # Axis = 1, is to use the value of the line, to sort columns, incoming line index . 3 df.sort_values (2, Axis =. 1)
1 # Sort index 2 df.index = [5,2,4,1 ] . 3 df.columns = List ( ' ADCE ' ) . 4 Print (DF) . 5 . 6 df.sort_index (Ascending = False) # default row index descending descending ascending = False . 7 # Axis sorted. 1 = column index . 8 df.sort_index (Axis =. 1)