pandas advanced processing-merge
If your data consists of multiple tables, sometimes you need to merge different content together for analysis
1 pd.concat realizes data merging
- pd.concat([data1, data2], axis=1)
- Combine according to row or column, axis=0 is column index, axis=1 is row index
For example, we merge the one-hot encoding we just processed with the original data
# 按照行索引进行
pd.concat([data, dummies], axis=1)
[Here is the data from the previous blog post.]
2 pd.merge
- pd.merge(left, right, how='inner', on=None)
- You can specify to merge according to the common key-value pair of the two sets of data or to each other
left
: DataFrameright
: Another DataFrameon
: Specified common key- how: In what way to connect
Merge method | SQL Join Name | Description |
---|---|---|
left |
LEFT OUTER JOIN |
Use keys from left frame only |
right |
RIGHT OUTER JOIN |
Use keys from right frame only |
outer |
FULL OUTER JOIN |
Use union of keys from both frames |
inner |
INNER JOIN |
Use intersection of keys from both frames |
2.1 pd.merge merge
left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
# 默认内连接
result = pd.merge(left, right, on=['key1', 'key2'])
- Left connect
result = pd.merge(left, right, how='left', on=['key1', 'key2'])
- Right connection
result = pd.merge(left, right, how='right', on=['key1', 'key2'])
- External link
result = pd.merge(left, right, how='outer', on=['key1', 'key2'])
3 summary
- pd.concat([data1, data2], axis=**)
- pd.merge(left, right, how=, on=)
- how - how to connect
- on - What are the basis of the connected keys