Get pandas data merge in one article
In the actual processing of data business requirements, we often encounter such a requirement: connect multiple tables and then perform data processing and analysis, similar to the connection query function in SQL.
Pandas also provides several methods to achieve this function. The most prominent and widely used method is merge. In this article, the following four methods and parameters will be explained in detail through actual cases.
- merge
- append
- join
- Concat
provides a way to get the source code of this article at the end of the article for your convenience.Article Directory
These two libraries must be imported when importing the library for data analysis, and the international practice is general.
import pandas as pd
import numpy as np
— 01 —
merge
Official parameters The parameters
of the merge function officially provided are as follows:
The following will explain the use of several important parameters through cases:
DataFrame.merge(left, right,
how='inner', # {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’
on=None,
left_on=None, right_on=None,
sort=False,
suffixes=('_x', '_y'))
Simulation data
Note the difference between the 4 sets of data
Use default parameters
Two different writing methods have the same effect
Parameter how
- There are 4 values for the how parameter:
- inner (default)
- outer
- right
- left
Parameter on
The column index column name used for connection must exist in the two dataframe data at the same time, similar to the same field attributes of the two tables in SQL.
If not specified or other parameters are not specified, the two dataframe data Same key as connecting key
The on parameter is a single field
Another example: the
on parameter is a list of multiple fields
Parameter lefton/righton
Parameter suffixes
When merging, one column and two tables have the same name, but the values are different. If you want to save both, use the suffix method, the default is _x, _y, you can specify it yourself
Parameter sort
Sort the values of the same key when connecting
— 02 —
concat
Official parameters
The concat method is to merge the data in the two DataFrame data frames
- Specify whether to merge in the row or column direction through the axis parameter
- Parameter ignore_index realizes merged index rearrangement
Generate data
Specify merge axis
Change index
join parameterssort-attribute sort
— 03 —
append
Official parameters
Basic use
data3.append(data4) # 等同于pd.append([data3, data4]) 忽略pandas版本的警告
Change index-natural number sort
data3.append(data4, ignore_index=True) # 设置参数
sort=True-sort of attributes
data3.append(data4) # 默认对字段属性排序
— 04 —
join
Official parameters
Merge through the same index
Same field attribute refers to suffix
The same field becomes index
Keep the same field once
To facilitate your practice, you can reply to "20200917" in the backstage of the public account "Python Data Way" to get the source code file of this article.
---------End---------