Pandas merge and join

Data tables can be connected according to keys (merge function), or they can be merged using axes (concat function)

One-touch connection

The grammatical format is as follows: the pandas.merge(left,right,how="" on=key) onparameter can choose the key value of the connection.
The how parameter can choose the connection method and the connection connotation of the SQL statement is the same, the optional values ​​are as follows:

parameter SQL equivalent description
left LEFT OUTER JOIN Display all rows of the left object after merging
right RIGHTOUTER JOIN Display all rows of the object on the right after merging
outer FULL OUTER JOIN Show all rows after merging
inner INNER OUTER JOIN Display common lines after merging
import pandas as pd
df_price = pd.DataFrame( 
    {
    
    
        'Date': pd.date_range('2019-1-1', periods=4),
        'AdjClose': [24.42, 25.00, 25.25, 25.64]})
df_volume = pd.DataFrame(
    {
    
    
        'Date': pd.date_range('2019-1-2',periods=5),
        'Volume' : [56081400, 99455500, 83028700, 100234000, 73829000]
    })

Data display
Output result
Left connection, right connection

# 左连接
pd.merge(df_price,df_volume,how='left',on='Date')
# 右连接
pd.merge(df_price,df_volume,how='right',on='Date')

Run screenshot
Internal connection, external connection

# 内连接
pd.merge(df_price,df_volume,how='inner',on='Date')
# 外连接
pd.merge(df_price,df_volume,how='outer',on='Date')

Screenshot of running result

Multi-key connection

Here is a new column for df_price and df_volume to demonstrate multi-key connection. The specific data structure is as follows:
operation result

# 使用键的联合
pd.merge(df_price,df_volume,how="outer",on=['Date','words'])

The results of the operation are as follows
operation result
. In order to show the effect, I use outer to merge. It is found that the two data will be filled with NaN where there is no place, and the number is also the accumulation of the two DataFrames.

merge

Numpy arrays can be connected to each other and used np.concat; in the same way, Series can also be connected to each other, and DataFrame can also be connected to each other and used pd.concat. Use concat contains the axis parameter, the default is axis=0 (merge according to the line). The joinparameters inside can use the above connection value, and the usage method is consistent with the above.
Combined Series
code demo:

import pandas as pd
# 创建三个Series
s1 = pd.Series([0, 1], index=['a', 'b'])
s2 = pd.Series([2, 3, 4], index=['c', 'd','e'])
s3 = pd.Series([5, 6], index=['f', 'g'])

The data structure is shown below.
Structure screenshot
Basic use

# 默认进行行合并
pd.concat([s1,s2,s3])
# 使用列合并
pd.concat([s1,s2,s3],axis=1)

Insert picture description here
Give three Keys to create a multi-layer Series

pd.concat([s1,s2,s3],keys=['A','B','C'])

Run screenshot
Merge DataFrame
and merge Seres are basically the same

import pandas as pd
import numpy as np
df1 = pd.DataFrame( np.arange(12).reshape(3,4), columns=['a','b','c','d'])
df2 = pd.DataFrame( np.arange(6).reshape(2,3), columns=['b','d','a'])

The data structure is as follows
Run screenshot

pd.concat([df1,df2],axis=0)
# 放弃原本的index
pd.concat([df1,df2],axis=0,ignore_index=True)

Run screenshot
It can be found that when merging according to rows, when the indexes of two DataFrames are the same, they will be repeated. If inedx is not important, you can use the ignore_indexnewly created index.
Using additional merges
A useful shortcut for merging is the methods in Series and DataFrame instances append. These methods actually predate the concat()methods. They are connected along axis=0, which is the index.

df1.append(df2)
df1.append([df2,df1])

operation result

Guess you like

Origin blog.csdn.net/qq_44091773/article/details/106106319