Pandas---合并merge &concat

1 Merge

  数据合并的操作,类似于SQL中的关联。详细内容参看:官方文档

1.1 参数

DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
Parameters Type Detail
right DataFrame 关联的数据框
how {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’ 1.left:类似sql中left join
2.right:right join
3.outer :full outer join
4. inner: inner join
on label or list join on 的列名,默认是两个dataframe共有的列名
left_on boolean, default False 关联时,右表的列名
left_index label or list 关联时,用左表的index做key,默认为False,一般也用不到
right_index label or list 关联时,用右表的index做key,默认为False,一般也用不到
sort boolean, default False 关联后,是否排序
suffixes 2-length sequence (tuple, list, …) 关联之后,重复的列,加上前缀
copy boolean, default True 如果False,不复制不必要的数据
indicator boolean or string, default False 如果True,关联之后的dataframe添加一列,列名为’-merage’(或者不为True,为任意一个字符串,此时列名即是此字符串)。该列的值为{left_onlu,right_only,both},即该行来自哪张表
validate string, default None 如果指定类型,则会检查数据
1. “one_to_one” or “1:1”: 检查两个dataframe的key有没有重复项
2.“one_to_many” or “1:m”: 检查左表是不是unique
3. “many_to_one” or “m:1”: 检查右表是不是unique
4. “many_to_many” or “m:m”: 参数可选但是不会check
5. 0.21.0版本才有
Returns DataFrame 返回的也是DataFrame

1.2 Demo

  用两份数据,测试不同参数的实际效果。

import pandas as pd
A = pd.DataFrame({'leftkey': ['foo','bar','baz','foo'],"value":[1,2,3,4],"x":[4,3,2,1]})
B = pd.DataFrame({'rightkey': ['foo','bar','qux','bar'],"value":[5,6,7,8],"x":[4,3,2,1]})
print(A);print(B);
  leftkey  value  x  
0     foo      1  4  
1     bar      2  3  
2     baz      3  2  
3     foo      4  1  
  rightkey  value  x  
0      foo      5  4  
1      bar      6  3  
2      qux      7  2  
3      bar      8  1  

1.2.1 左关联

## on这个参数需要两个表里有相同的列名
## 关联之后,除key之外的列名,加了_x做标识
A.merge(right = B,how = "left",on = "x")
leftkey value_x x rightkey value_y
0 foo 1 4 foo 5
1 bar 2 3 bar 6
2 baz 3 2 qux 7
3 foo 4 1 bar 8
## 左关联的使用,其余右关联不赘
A.merge(right = B,how = "left",left_on  = "leftkey",right_on  = "rightkey")
leftkey value_x x_x rightkey value_y x_y
0 foo 1 4 foo 5.0 4.0
1 bar 2 3 bar 6.0 3.0
2 bar 2 3 bar 8.0 1.0
3 baz 3 2 NaN NaN NaN
4 foo 4 1 foo 5.0 4.0

1.2.2 inner join

## 取交集即可
A.merge(right = B,how = "inner",left_on  = "leftkey",right_on  = "rightkey")
leftkey value_x x_x rightkey value_y x_y
0 foo 1 4 foo 5 4
1 foo 4 1 foo 5 4
2 bar 2 3 bar 6 3
3 bar 2 3 bar 8 1

1.2.3 outer join

## 取并集
A.merge(right = B,how = "outer",left_on  = "leftkey",right_on  = "rightkey")
leftkey value_x x_x rightkey value_y x_y
0 foo 1.0 4.0 foo 5.0 4.0
1 foo 4.0 1.0 foo 5.0 4.0
2 bar 2.0 3.0 bar 6.0 3.0
3 bar 2.0 3.0 bar 8.0 1.0
4 baz 3.0 2.0 NaN NaN NaN
5 NaN NaN NaN qux 7.0 2.0

1.2.4 join之后排序

## 按照字典顺序排序,也是就是字母顺序排序
A.merge(right = B,how = "left",left_on  = "leftkey",right_on  = "rightkey",sort = True)
leftkey value_x x_x rightkey value_y x_y
0 bar 2 3 bar 6.0 3.0
1 bar 2 3 bar 8.0 1.0
2 baz 3 2 NaN NaN NaN
3 foo 1 4 foo 5.0 4.0
4 foo 4 1 foo 5.0 4.0

1.2.5 添加辅助列

## indicator = True,新增列名为_merge的一列,作为标识。即该行数据是属于左表独有还是右表独有,还是有both
A.merge(right = B,how = "outer",left_on  = "leftkey",right_on  = "rightkey",sort = True,indicator=True)
leftkey value_x x_x rightkey value_y x_y _merge
0 bar 2.0 3.0 bar 6.0 3.0 both
1 bar 2.0 3.0 bar 8.0 1.0 both
2 baz 3.0 2.0 NaN NaN NaN left_only
3 foo 1.0 4.0 foo 5.0 4.0 both
4 foo 4.0 1.0 foo 5.0 4.0 both
5 NaN NaN NaN qux 7.0 2.0 right_only
pd.merge_ordered(A1,B1,left_by='leftkey',fill_method='ffill',how="outer")
leftkey value x1 rightkey y1
0 foo 1 1 NaN NaN
1 foo 4 4 NaN NaN
2 foo 5 4 foo 4.0
3 foo 6 4 bar 3.0
4 foo 7 4 qux 2.0
5 foo 8 4 bar 1.0
6 bar 2 3 NaN NaN
7 bar 5 3 foo 4.0
8 bar 6 3 bar 3.0
9 bar 7 3 qux 2.0
10 bar 8 3 bar 1.0
11 baz 3 2 NaN NaN
12 baz 5 2 foo 4.0
13 baz 6 2 bar 3.0
14 baz 7 2 qux 2.0
15 baz 8 2 bar 1.0

2 Concat

  简单的行合并和列合并操作。详细内容参看:官方文档

2.1 参数

pandas.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True)
Parameters Type Detail
objs DataFrame,Series,Panel objects 需要拼接的数据
axis {0/’index’, 1/’columns’}, default 0 拼接的方向,0行拼接(R中rbind),1列拼接(R中cbind)
join {‘inner’, ‘outer’}, default ‘outer’ 如何处理其他轴上的数据,稍后实验解释
join_axes list of Index objects 不知道什么意思
ignore_index boolean, default False 索引数据是否使用,或者说行名是否沿用,默认False,即重新命名0:n-1
keys sequence, default None 分层索引的名字,或者说是合并之后,给原始数据一个行标识
levels list of sequences, default None 需要拼接的数据
names list, default None 就demo来看,是Series合并之后指定的列名,具体用法不明
verify_integrity boolean, default False 检查是否存在重复值,计算花费大,默认不执行
sort boolean, default None 需要拼接的数据
copy boolean, default True 如果False则不复制非必要数据,设置成False,似乎没有什么变化
Returns 返回类型和拼接对象有关系 如果DataFrame那么返回DataFrame

2.2 Demo

  测试上述参数的实际用法。

import pandas as pd
A = pd.DataFrame({'leftkey': ['foo','bar','baz','foo'],"value":[1,2,3,4],"x1":[4,3,2,1]})
B = pd.DataFrame({'rightkey': ['foo','bar','qux','bar'],"value":[5,6,7,8],"x2":[4,3,2,1]})
print(A);print(B);
  leftkey  value  x1  
0     foo      1   4  
1     bar      2   3  
2     baz      3   2  
3     foo      4   1  
  rightkey  value  x2  
0      foo      5   4  
1      bar      6   3  
2      qux      7   2  
3      bar      8   1  

2.2.1 行合并

  行合并需要设置axis = 0join默认为”outer”,即如果列名不一致,用NaN填充;如果join为inner,只保留共有列名的列。

pd.concat([A,B],axis=0,join = "outer")
leftkey rightkey value x1 x2
0 foo NaN 1 4.0 NaN
1 bar NaN 2 3.0 NaN
2 baz NaN 3 2.0 NaN
3 foo NaN 4 1.0 NaN
0 NaN foo 5 NaN 4.0
1 NaN bar 6 NaN 3.0
2 NaN qux 7 NaN 2.0
3 NaN bar 8 NaN 1.0
pd.concat([A,B],axis=0,join = "inner")
value
0 1
1 2
2 3
3 4
0 5
1 6
2 7
3 8

2.2.2 列合并

  列合并不多说,设置axis参数即可。

pd.concat([A,B],axis = 1)
leftkey value x1 rightkey value x2
0 foo 1 4 foo 5 4
1 bar 2 3 bar 6 3
2 baz 3 2 qux 7 2
3 foo 4 1 bar 8 1

2.2.3 多层索引

  添加更高维度的索引,有点像R里的array或者list。

pd.concat([A, B], keys=['A', 'B'],axis = 1)
A B
leftkey value x1 rightkey value x2
0 foo 1 4 foo 5 4
1 bar 2 3 bar 6 3
2 baz 3 2 qux 7 2
3 foo 4 1 bar 8 1
pd.concat([A, B], keys=['A', 'B'],axis = 0)
leftkey rightkey value x1 x2
A 0 foo NaN 1 4.0 NaN
1 bar NaN 2 3.0 NaN
2 baz NaN 3 2.0 NaN
3 foo NaN 4 1.0 NaN
B 0 NaN foo 5 NaN 4.0
1 NaN bar 6 NaN 3.0
2 NaN qux 7 NaN 2.0
3 NaN bar 8 NaN 1.0

                        2018-06-11 于南京建邺区 新城科技园

猜你喜欢

转载自blog.csdn.net/wendaomudong_l2d4/article/details/80653739