Pandas DataFrame数据合并(merge)

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/baishengxu/article/details/81556735

1.merge:相当于数据库的连接操作

  • 参数how='outer':外链接,取key的并集做连接
import pandas as pd
data1=[[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]]
left = pd.DataFrame(data1,columns=['A','B','C','D'],index=['a','b','c','d'])
data2=[['a','a','a','a'],['b','b','b','b'],['c','c','c','c'],['d','d','d','d']]
right = pd.DataFrame(data2,columns=['D','F','G','H'],index=['e','f','g','h'])
print(left,'\n',right)
outer_data=pd.merge(left,right,how='outer')
print(outer_data)

  • 参数how='inner':内连接,取key的交集做连接
import pandas as pd
data1=[[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]]
left = pd.DataFrame(data1,columns=['A','B','C','D'],index=['a','b','c','d'])
data2=[[1,'a','a','a'],['b','b','b','b'],['c','c','c','c'],['d','d','d','d']]
right = pd.DataFrame(data2,columns=['D','F','G','H'],index=['e','f','g','h'])
print(left,'\n',right)
outer_data=pd.merge(left,right,how='inner')
print(outer_data)

  • 参数how='left':左连接,取左边所有key做连接
import pandas as pd
data1=[[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]]
left = pd.DataFrame(data1,columns=['A','B','C','D'],index=['a','b','c','d'])
data2=[[1,'a','a','a'],['b','b','b','b'],['c','c','c','c'],['d','d','d','d']]
right = pd.DataFrame(data2,columns=['D','F','G','H'],index=['e','f','g','h'])
print(left,'\n',right)
outer_data=pd.merge(left,right,how='left')
print(outer_data)

  • 参数how='right':右连接,取右边所有key做连接
import pandas as pd
data1=[[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]]
left = pd.DataFrame(data1,columns=['A','B','C','D'],index=['a','b','c','d'])
data2=[[1,'a','a','a'],['b','b','b','b'],['c','c','c','c'],['d','d','d','d']]
right = pd.DataFrame(data2,columns=['D','F','G','H'],index=['e','f','g','h'])
print(left,'\n',right)
outer_data=pd.merge(left,right,how='right')
print(outer_data)

  • 不带参数默认是内连接,取相同列名作为key合并
import pandas as pd
data1=[[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]]
left = pd.DataFrame(data1,columns=['A','B','C','D'],index=['a','b','c','d'])
data2=[[1,'a','a','a'],['b','b','b','b'],['c','c','c','c'],['d','d','d','d']]
right = pd.DataFrame(data2,columns=['D','F','G','H'],index=['e','f','g','h'])
print(left,'\n',right)
outer_data=pd.merge(left,right)
print(outer_data)

  • 如果列的名字都不一样,可以使用参数left_on='左列名',right_on='右列名'指定连接的key
import pandas as pd
data1=[[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]]
left = pd.DataFrame(data1,columns=['A','B','C','D'],index=['a','b','c','d'])
data2=[[1,'a','a','a'],['b','b','b','b'],['c','c','c','c'],['d','d','d','d']]
right = pd.DataFrame(data2,columns=['E','F','G','H'],index=['e','f','g','h'])
print(left,'\n\n',right)
outer_data=pd.merge(left,right,left_on='A',right_on='E')
print('\n',outer_data)

  • 参数on='key',指定连接的key,多个key合并:on['key1','key2']
import pandas as pd
data1=[[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]]
left = pd.DataFrame(data1,columns=['A','B','C','D'],index=['a','b','c','d'])
data2=[[1,'a',2,'a'],[2,2,'b','b'],['c','c','c','c'],['d','d','d','d']]
right = pd.DataFrame(data2,columns=['A','B','G','H'],index=['e','f','g','h'])
print(left,'\n\n',right)
outer_data=pd.merge(left,right,on=['A','B'])
print('\n',outer_data)

  • 处理重名的列,suffixes('列的别名1', '列的别名2’) 可以指定相同列的别名
  • import pandas as pd
    data1=[[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]]
    left = pd.DataFrame(data1,columns=['A','B','C','D'],index=['a','b','c','d'])
    data2=[[1,'a',2,'a'],[2,2,'b','b'],['c','c','c','c'],['d','d','d','d']]
    right = pd.DataFrame(data2,columns=['A','B','G','H'],index=['e','f','g','h'])
    print(left,'\n\n',right)
    outer_data=pd.merge(left,right,on=['A'],suffixes=('left_B','right_b'))
    print('\n',outer_data)

  • 重新合并数据后,之前的DataFrame的索引会被丢弃
  • 如果要用DataFrame的索引作为连接的key,可以使用参数left_index=True,right_index=True,左边列与右边索引作为连接key:left_on='列名',right_index=True
  • merge参数说明:

(图片来源:截图于 利用python进行数据分析 Wes McKinney 著)

猜你喜欢

转载自blog.csdn.net/baishengxu/article/details/81556735