Wes McKinney, the author of pandas, has an authoritative and concise entry-level introduction to all aspects of pandas in [PYTHON FOR DATA ANALYSIS], but in the actual use process, I found that the content of the book is only the tip of the iceberg. When it comes to operations such as row update and table merging of pandas data, the commonly used methods are concat, join, and merge. However, for many novices, these three methods are not very easy to distinguish the occasion and purpose of use. Today, I will summarize the usage of the chapters on data merging and restatement on the pandas official website .
- The code blocks in the text are mainly provided by the pandas official website tutorial.
1 concat
concat函数是在pandas底下的方法,可以将数据根据不同的轴作简单的融合
pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
keys=None, levels=None, names=None, verify_integrity=False)
Parameter description
objs: series, dataframe or a sequence composed of panels lsit
axis: the axis to be merged and linked, 0 is row, 1 is column
join: connection mode inner, or outer
Some other parameters are not commonly used, and they will be explained when they are used.
1.1 Tables with the same field are connected end to end
# 现将表构成list,然后在作为concat的输入
In [4]: frames = [df1, df2, df3]
In [5]: result = pd.concat(frames)
To add a level of key when connecting to identify which table the data comes from, you can add the key parameter
In [6]: result = pd.concat(frames, keys=['x', 'y', 'z'])
The effect is as follows
1.2 Horizontal table splicing (row alignment)
1.2.1 axis
When axis = 1, concat is row alignment, and then merges two tables with different column names
In [9]: result = pd.concat([df1, df4], axis=1)
1.2.2 join
With the attributes of the join parameter, if it is 'inner', the intersection of the two tables is obtained, and if it is outer, the union of the two tables is obtained.
In [10]: result = pd.concat([df1, df4], axis=1, join='inner')
1.2.3 join_axes
If the parameter of join_axes is passed in, you can specify which axis to align the data according to.
For example, if you align the data according to the df1 table, the specified axis of the df1 table will be retained, and then the df4 table will be spliced with it.
In [11]: result = pd.concat([df1, df4], axis=1, join_axes=[df1.index])
1.3 append
append是series和dataframe的方法,使用它就是默认沿着列进行凭借(axis = 0,列对齐)
In [12]: result = df1.append(df2)
1.4 Concat ignoring index
If the indexes of the two tables have no actual meaning, use the ignore_index parameter, set to true, and the two merged tables will be aligned according to the column fields, and then merged. Finally, rearrange a new index.
1.5 Add a key to distinguish data groups while merging
The aforementioned keys parameter can be used to add keys to the merged table to distinguish different table data sources
1.5.1 It can be implemented directly with the key parameter
In [27]: result = pd.concat(frames, keys=['x', 'y', 'z'])
1.5.2 Passing in a dictionary to add grouping keys
In [28]: pieces = {'x': df1, 'y': df2, 'z': df3}
In [29]: result = pd.concat(pieces)
1.6 Add a new row to the dataframe
The append method can insert enough series and dictionary data as a new row of the dataframe.
In [34]: s2 = pd.Series(['X0', 'X1', 'X2', 'X3'], index=['A', 'B', 'C', 'D'])
In [35]: result = df1.append(s2, ignore_index=True)
Merge tables with different table column fields
如果遇到两张表的列字段本来就不一样,但又想将两个表合并,其中无效的值用nan来表示。那么可以使用ignore_index来实现。
In [36]: dicts = [{'A': 1, 'B': 2, 'C': 3, 'X': 4},
....: {'A': 5, 'B': 6, 'C': 7, 'Y': 8}]
....:
In [37]: result = df1.append(dicts, ignore_index=True)