PANDAS data merging and reshaping (concat)

Wes McKinney, the author of pandas, has an authoritative and concise entry-level introduction to all aspects of pandas in [PYTHON FOR DATA ANALYSIS], but in the actual use process, I found that the content of the book is only the tip of the iceberg. When it comes to operations such as row update and table merging of pandas data, the commonly used methods are concat, join, and merge. However, for many novices, these three methods are not very easy to distinguish the occasion and purpose of use. Today, I will summarize the usage of the chapters on data merging and restatement on the pandas official website .

  • The code blocks in the text are mainly provided by the pandas official website tutorial.

1 concat

concat函数是在pandas底下的方法,可以将数据根据不同的轴作简单的融合
pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
       keys=None, levels=None, names=None, verify_integrity=False)

Parameter description
objs: series, dataframe or a sequence composed of panels lsit
axis: the axis to be merged and linked, 0 is row, 1 is column
join: connection mode inner, or outer

Some other parameters are not commonly used, and they will be explained when they are used.

1.1 Tables with the same field are connected end to end

write picture description here

# 现将表构成list,然后在作为concat的输入
In [4]: frames = [df1, df2, df3]

In [5]: result = pd.concat(frames)

To add a level of key when connecting to identify which table the data comes from, you can add the key parameter

In [6]: result = pd.concat(frames, keys=['x', 'y', 'z'])

The effect is as follows

write picture description here

1.2 Horizontal table splicing (row alignment)

1.2.1 axis

When axis = 1, concat is row alignment, and then merges two tables with different column names

In [9]: result = pd.concat([df1, df4], axis=1)

write picture description here

1.2.2 join

With the attributes of the join parameter, if it is 'inner', the intersection of the two tables is obtained, and if it is outer, the union of the two tables is obtained.

In [10]: result = pd.concat([df1, df4], axis=1, join='inner')

write picture description here

1.2.3 join_axes

If the parameter of join_axes is passed in, you can specify which axis to align the data according to.
For example, if you align the data according to the df1 table, the specified axis of the df1 table will be retained, and then the df4 table will be spliced ​​with it.

In [11]: result = pd.concat([df1, df4], axis=1, join_axes=[df1.index])

write picture description here

1.3 append

append是series和dataframe的方法,使用它就是默认沿着列进行凭借(axis = 0,列对齐)
In [12]: result = df1.append(df2)

write picture description here

1.4 Concat ignoring index

If the indexes of the two tables have no actual meaning, use the ignore_index parameter, set to true, and the two merged tables will be aligned according to the column fields, and then merged. Finally, rearrange a new index.
write picture description here

1.5 Add a key to distinguish data groups while merging

The aforementioned keys parameter can be used to add keys to the merged table to distinguish different table data sources

1.5.1 It can be implemented directly with the key parameter

In [27]: result = pd.concat(frames, keys=['x', 'y', 'z'])

write picture description here

1.5.2 Passing in a dictionary to add grouping keys

In [28]: pieces = {'x': df1, 'y': df2, 'z': df3}

In [29]: result = pd.concat(pieces)

write picture description here

1.6 Add a new row to the dataframe

The append method can insert enough series and dictionary data as a new row of the dataframe.
write picture description here

In [34]: s2 = pd.Series(['X0', 'X1', 'X2', 'X3'], index=['A', 'B', 'C', 'D'])

In [35]: result = df1.append(s2, ignore_index=True)

Merge tables with different table column fields

如果遇到两张表的列字段本来就不一样,但又想将两个表合并,其中无效的值用nan来表示。那么可以使用ignore_index来实现。

write picture description here

In [36]: dicts = [{'A': 1, 'B': 2, 'C': 3, 'X': 4},
   ....:          {'A': 5, 'B': 6, 'C': 7, 'Y': 8}]
   ....: 

In [37]: result = df1.append(dicts, ignore_index=True)

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325436688&siteId=291194637