In doing data processing encountered splicing operation among a plurality of sets of data, since the data set where the read Pandas usually are used, it is generally aimed at the type of data DataFrame splicing operation.
1. merge
For row two data sets by one or more connecting bonds, similar to the SQL JOIN. Typical application scenarios of the function is, for the same primary key table that contains the presence of two different fields, and now we want to integrate them into a table. In this exemplary case, the result sets the number of lines does not increase, compared with the number of columns and number of rows minus the number of two linkages metadata.
df.merge(right, how='inner', on=None, left_on=None,
right_on=None, left_index=False, right_index=False,
sort=False, suffixes=('_x', '_y'), copy=True)
Parameters explanation:
right: the target data to be connected, with type or column names DataFrame Series
how: connection, similar sql statement (left, right, inner, outer), where the default is the 'inner', optional parameters are left, right, inner, outer
on: column name or index connection, that is specified in the connection between two objects which you want to connect through the column name or index name.
left_on: Specifies the left of DataFrame which are connected to the column name or index name
right_on: DataFrame specifies the right of which is connected to the column name or index name
left_index: Key as the connection with the left DataFrame
right_index: Key as a connection with the right DataFrame
sort: for lexicographically sorting Key connection, the default is False, the default order of linkage depends on the type of connection
suffixes: default column name after the connection with the subscript ( '_x', '_y')
copy: Copy default, if set to False, as much as possible to avoid copying
a. In the same default name to join key column
2. join
Mosaic columns, mainly for consolidation on the index, provides an easy way for the two DataFrame different column index merged into a DataFrame
df.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)
Parameters explanation:
other: another or have to be spliced DataFrame Series name list
on: column name or index connection, that is specified in the connection between two objects which you want to connect through the column name or index name.
how: connection, similar sql statement (left, right, inner, outer), where the default is 'left', optional parameters are left, right, inner, outer
lsuffix: The Key to use the connection left subscript
rsuffix: the right to use the connection Key index
sort: column name after stitching lexicographically sorting, the default is False, False, when retained in accordance with the order to sort DataFrame left.
3.
3. concat
A plurality of objects may be stacked together along a shaft
Method concat connected to total (UNION ALL) database can be specified by a connection shaft may be connected to specify join (outer, inner only two). The difference is that the database will not go concat weight, to achieve the effect of deduplication method can be used drop_duplicates
pandas.concat(objs, axis=0, join='outer',
ignore_index=False, keys=None, levels=None,
names=None, verify_integrity=False,copy=True)
Parameters explanation:
objs: a collection of objects to be connected, is typically a list or dictionary;
axis: 0 index representative of the connection, is connected for column 1, the default is 0
join: connection, by default 'outer', you can also choose to 'inner'
ignore_index: The default is False, if True, the Index refers to previous neglect, according to the default allocation index directly allocated from 0-N-1, for the index after the combination did not much sense when the time is more appropriate, to rebuild the index
keys: used to create a hierarchy of index
levels: Level, to be divided in a column
names: column name specified levels of
verify_integrity: to determine whether the new combination of data duplicate values, but will be very resource-intensive
copy: copy data