Pandas数据合并一:concat

原文地址

分类目录——Pandas

通过pd.concat()方法合并几个DataFrame对象进行合并

  • 导入支持包

    import pandas as pd
    import numpy as np
    
  • axis属性——指定合并的方向

    • 生成测试数据
    df1 = pd.DataFrame(np.ones((1,4))*0, columns=['a','b','c','d'])
    #      a    b    c    d
    # 0  0.0  0.0  0.0  0.0
    df2 = pd.DataFrame(np.ones((2,4))*1, columns=['a','b','c','d'])
    #      a    b    c    d
    # 0  1.0  1.0  1.0  1.0
    # 1  1.0  1.0  1.0  1.0
    df3 = pd.DataFrame(np.ones((3,4))*2, columns=['a','b','c','d'])
    #      a    b    c    d
    # 0  2.0  2.0  2.0  2.0
    # 1  2.0  2.0  2.0  2.0
    # 2  2.0  2.0  2.0  2.0
    
    • 纵向合并
    # axis=0在数值方向上合并
    res1 = pd.concat([df1, df2, df3], axis=0)
    #      a    b    c    d
    # 0  0.0  0.0  0.0  0.0
    # 0  1.0  1.0  1.0  1.0
    # 1  1.0  1.0  1.0  1.0
    # 0  2.0  2.0  2.0  2.0
    # 1  2.0  2.0  2.0  2.0
    # 2  2.0  2.0  2.0  2.0
    
    • 横向合并
    res2 = pd.concat([df1, df2, df3], axis=1) # axis=1在行方向上合并,缺少的数据补nan
    #      a    b    c    d    a    b    c    d    a    b    c    d
    # 0  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0  2.0  2.0  2.0  2.0
    # 1  NaN  NaN  NaN  NaN  1.0  1.0  1.0  1.0  2.0  2.0  2.0  2.0
    # 2  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  2.0  2.0  2.0  2.0
    
    • index_ignore属性
    # ignore_index说的是index不参与合并,合成的res自己生成新的索引
    res3 = pd.concat([df1, df2, df3], axis=0, ignore_index=True)
    #      a    b    c    d
    # 0  0.0  0.0  0.0  0.0
    # 1  1.0  1.0  1.0  1.0
    # 2  1.0  1.0  1.0  1.0
    # 3  2.0  2.0  2.0  2.0
    # 4  2.0  2.0  2.0  2.0
    # 5  2.0  2.0  2.0  2.0
    
  • join属性——指定合并交集(inner)还是合并并集(outer)

    • 生成测试数据
    df1 = pd.DataFrame(np.ones((2,4))*0, columns=['a','b','c','d'], index=[1,2])
    #      a    b    c    d
    # 1  0.0  0.0  0.0  0.0
    # 2  0.0  0.0  0.0  0.0
    df2 = pd.DataFrame(np.ones((2,4))*1, columns=['b','c','d','e'], index=[3,4])
    #      b    c    d    e
    # 3  1.0  1.0  1.0  1.0
    # 4  1.0  1.0  1.0  1.0
    # df1和df2的属性(列)是错开的
    
    • join=‘outer’
    # 外合并,join=‘outer’,对几个单体的所有属性进行合并,(缺少的数据补nan)
    res1 = pd.concat([df1, df2], axis=0, join='outer', sort=False)  # 这个sort不传会报警告
    #      a    b    c    d    e
    # 1  0.0  0.0  0.0  0.0  NaN
    # 2  0.0  0.0  0.0  0.0  NaN
    # 3  NaN  1.0  1.0  1.0  1.0
    # 4  NaN  1.0  1.0  1.0  1.0
    
    • join=‘inner’
    # 内合并,join='inner',对几个单体的公共属性进行合并
    res2 = pd.concat([df1, df2], axis=0, join='inner')
    #      b    c    d
    # 1  0.0  0.0  0.0
    # 2  0.0  0.0  0.0
    # 3  1.0  1.0  1.0
    # 4  1.0  1.0  1.0
    
  • join_axes属性——合并行,指定要合并的行的索引

    • 生成测试数据
    df1 = pd.DataFrame(np.ones((3,4))*0, columns=['a','b','c','d'], index=[1,2,3])
    #      a    b    c    d
    # 1  0.0  0.0  0.0  0.0
    # 2  0.0  0.0  0.0  0.0
    # 3  0.0  0.0  0.0  0.0
    df2 = pd.DataFrame(np.ones((3,4))*1, columns=['b','c','d','e'], index=[2,3,4])
    #      b    c    d    e
    # 2  1.0  1.0  1.0  1.0
    # 3  1.0  1.0  1.0  1.0
    # 4  1.0  1.0  1.0  1.0
    
    • join_axes=[df1.index] df2没有的补空值(nan)
    # 依照 df1.index 进行横向合并,有点儿像数据库连接的左外连接
    res1 = pd.concat([df1, df2], axis=1, join_axes=[df1.index])
    #     a    b    c    d    b    c    d    e
    # 1  0.0  0.0  0.0  0.0  NaN  NaN  NaN  NaN
    # 2  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0
    # 3  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0
    
  • append在竖直方向上追加

    • 生成测试数据
    # 生成测试数据
    df1 = pd.DataFrame(np.ones((2,3))*0, columns=['a','b','c'])
    #      a    b    c
    # 0  0.0  0.0  0.0
    # 1  0.0  0.0  0.0
    df2 = pd.DataFrame(np.ones((2,3))*1, columns=['a','b','c'])
    #      a    b    c
    # 0  1.0  1.0  1.0
    # 1  1.0  1.0  1.0
    df3 = pd.DataFrame(np.ones((2,3))*1, columns=['a','b','c'])
    #      a    b    c
    # 0  1.0  1.0  1.0
    # 1  1.0  1.0  1.0
    s1 = pd.Series([1,2,3], index=['a','b','c'])
    # a    1
    # b    2
    # c    3
    # dtype: int64
    
    • 示例
    # 将df2合并到df1的下面,
    # ignore_index=True,不用原来的索引,重新生成索引
    res1 = df1.append(df2, ignore_index=True)
    #      a    b    c
    # 0  0.0  0.0  0.0
    # 1  0.0  0.0  0.0
    # 2  1.0  1.0  1.0
    # 3  1.0  1.0  1.0
    
    # 合并多个df,将df2与df3合并至df1的下面,以及重置index,并打印出结果
    res2 = df1.append([df2, df3], ignore_index=True)
    #      a    b    c
    # 0  0.0  0.0  0.0
    # 1  0.0  0.0  0.0
    # 2  1.0  1.0  1.0
    # 3  1.0  1.0  1.0
    # 4  1.0  1.0  1.0
    # 5  1.0  1.0  1.0
    
    # 合并series,将s1合并至df1,以及重置index,并打印出结果
    res3 = df1.append(s1, ignore_index=True)
    #      a    b    c
    # 0  0.0  0.0  0.0
    # 1  0.0  0.0  0.0
    # 2  1.0  2.0  3.0
    
  • 参考文献

    代码主要来自 Pandas 合并 concat,略有改动

发布了139 篇原创文章 · 获赞 116 · 访问量 1万+

猜你喜欢

转载自blog.csdn.net/BBJG_001/article/details/104550019