Pandas data conversion, packet, fusion
Pandas Data Fusion
- concat () method
- merge () method
concat
. 1 Frames = [DF, DS] 2 # Axis. 1 = longitudinal transverse axis = 0 . 3 pd.concat (Frames, = Axis 0) . 4 . 5 # taken and set . 6 pd.concat (Frames, axis = 0, the Join = " Outer " ) 7 # intersected . 8 pd.concat (Frames, Axis = 0, the Join = " Inner " ) . 9 10 D1 = pd.DataFrame ([22 is, 33 is], index = [ " height " , " weight " ], Columns = [ " Joe Smith " ]) 11pd.DataFrame = D2 ([33 is, 44 is], index = [ " height " , " weight " ], Columns = [ " John Doe " ]) 12 is Frames = [D1, D2] 13 is 14 pd.concat (Frames, Axis . 1 =, = the Join " Inner " ) 15 16 DF1 = pd.DataFrame ({ ' Age ' : [22, 26 ], 17 ' origin ' : [ ' Beijing ' , ' Hebei ' ]}, 18 index=['Zhang ' , ' Lee ' ]) . 19 DF2 = pd.DataFrame ({ ' height ' : [175, 180 ], 20 ' weight ' : [70, 85 ]}, 21 is index = [ ' Zhang ' , ' Lee ' ]) 22 is DF3 = pd.DataFrame ({ ' height ' : [175, 183 ], 23 ' weight ' : [70, 87 ]}, 24 index = [ 'Zhang' , ' Qianmou ' ]) 25 26 is pd.concat ([DF1, DF3], Axis =. 1, the Join = " Inner " )
merge
. 1 left pd.DataFrame = ({ ' Name ' : [ ' Zhang ' , ' Lee ' , ' segment of a ' ], 2 ' Age ' : [20 is, 26,24 ]}) . 3 right pd.DataFrame = ( { ' name ' : [ ' Zhang ' , ' Lee ' , ' Qianmou ' ], 4 ' origin ' : [ ' Beijing ' , 'Hebei ' ,' Jiangsu ' ]}) . 5 . 6 pd.merge (left, right, left_index = True, right_on = " name " , How = " Outer " ) . 7 . 8 # accordance indexing . 9 pd.merge (left, right, left_index = True , right_index = True, How = " Outer " ) 10 # opposite to realize the method according to an example of fusion join simpler, defaults to join left connection . 11 left.join (right, How = " Outer " )
Data Fusion combine_first instance method to achieve a patch for the missing data by calling the object's data parameter object
. 1 df2.combine_first (df1) # The values df1 is to fill in missing values df2
Sequence
- sorting the index sort_index
- sort_values sorted according to the value of a column
- Shuffled
. 1 df.sort_values ( ' Score ' , ascending = False) # adjusting ascending determines is ascending or descending, the default is True, ascending 2 # generates a random reordering column index . 3 Sampler np.random.permutation = (. 3 ) . 4 Sampler . 5 df.take (Sampler)
GroupBy technology
- Use GroupBy () method may be grouped along any axis, and the packet based on each of the group name as a key, there are three methods
- df.groupby(key)
- df.groupby(key, axis=1)
- df.groupby([key1, key2])
The basic method of data packet
- Packet size and packet sequencing
- The packet iteration
- Column selection within the specified group or groups