DataFrame pandas in combined manner (merge, join, concat)

pandas plurality DataFrame combined in a variety of ways:

  1. The merge merger merging is the same value in a column
  2. merge join around a plurality DataFrame, incorporating a plurality of columns is equivalent to
  3. concat be the same combined DataFrame plurality of column names, or may be provided combined columns into row

1. merge (merging according to the same connecting column values)

how = connection
on = combined column name

The default connection within an inner join

Not only consider the key column set some common values

df1 = pd.DataFrame({'id':[1,2,3,4,5,6],'city':['wuhan','newyork','shanghai','paris','losangeles','london'],'country':['china','usa','china','france','usa','england'],'visits':[2,1,2,1,2,1]})
df2 = pd.DataFrame({'id':[1,2,3,4,5],'country':['china','france','usa','germany','japan'],'visits':[2,2,2,2,2]})
print(df1)
print(df2)

Here Insert Picture Description

print('内链接')
df3 = pd.merge(df1,df2,on='country')
print(df3)

Here Insert Picture Description

Left connection

Df1 in consideration of all the values ​​in the column key, key value column df2 does not correspond, then, NaN filling

print('左链接')
df3 = pd.merge(df1,df2,on='country',how='left')
print(df3)

Here Insert Picture Description

The right connection

Consider all the key values ​​in columns df2, key value column df1 it does not correspond, NaN filling

print('右链接')
df3 = pd.merge(df1,df2,on='country',how='right')
print(df3)

Here Insert Picture Description

Multiple columns named key link

The combined plurality of columns, if desired, provided on = ( 'column 1', 'Column 2', ...)

print('多个列名为链接键')
df4 = pd.merge(df1,df2,on=('country','visits'))
print(df4)

Here Insert Picture Description

After the merger set the column name suffix for the new column

If other columns the same column name exists after the merger, may be provided df1, df2 repeated name suffixes:
suffixes, that = ( 'suffix left', 'right-suffix'))
If not set, the default is the suffix _x, _y

print('合并后设置新列的列名后缀')
df4 = pd.merge(df1,df2,on='country',suffixes=('_city','_country'))
print(df4)

Here Insert Picture Description

About different table column names merge

The case has left and right two tables need to merge not necessarily the same column name, such as the country combined df2 df1 in countryname

print('左右表不同列名合并')
df22.columns=['id','countryname','visits']
df5 = pd.merge(df1,df22,left_on=['country'],right_on=['countryname'],suffixes=('_city','_country'))
print(df5)

Here Insert Picture Description

Delete extra columns

Extra columns (countryname) just delete appears

print('删除列')
df5.drop(columns=['countryname'],inplace=True)
print(df5)

Here Insert Picture Description

2.join (df two different column names of merger)

Default left join

Do not set the how, the default left-connected, that is, considering all the rows df1, df2 if the number of lines is not enough to make up the Nan

df1 = pd.DataFrame({'id':[1,2,3,4,5,6],'city':['wuhan','newyork','shanghai','paris','losangeles','london'],'country':['china','usa','china','france','usa','england'],'visits':[2,1,2,1,2,1]})
df2 = pd.DataFrame({'idy':[1,2,3,4,6],'cityy':['wuhan','newyork','shanghai','paris','losangeles'],'countryy':['china','usa','china','france','usa']})
print(df1)
print(df2)

Here Insert Picture Description

print('默认how=left')
df3 = df1.join(df2)
print(df3)

Here Insert Picture Description

The right connection

Set how = 'right', considering all rows df2, df1 if the number of lines is sufficient to complement Nan

print('右链接')
df3 = df1.join(df2,how='right')
print(df3)

Here Insert Picture Description

En

how = 'inner', in order df1, df2 minimum number of rows prevail

print('内链接')
df3 = df1.join(df2,how='inner')
print(df3)

Here Insert Picture Description

Outer join

how = 'outer', in order df1, df2 largest number of rows prevail

print('外链接')
df3 = df1.join(df2,how='outer')
print(df3)

Here Insert Picture Description

concat (specified dimensions merge merge df)

Column connection

Set axis = 1, then all the columns are spliced ​​together, the number of columns = the number of columns + df2 series df1

from pandas import concat
df1 = pd.DataFrame({'id':[1,2,3,4,5,6],'city':['wuhan','newyork','shanghai','paris','losangeles','london'],'country':['china','usa','china','france','usa','england'],'visits':[2,1,2,1,2,1]})
df2 = pd.DataFrame({'id':[1,2,3,4,6],'city':['wuhan','newyork','shanghai','paris','losangeles'],'country':['china','usa','china','france','usa'],'visits':[2,1,2,1,2]})
print(df1)
print(df2)

Here Insert Picture Description

print('列连接')
df3 = concat([df1,df2],join="inner",axis=1)
print(df3)

Here Insert Picture Description

Line connection

Axis is not provided, the axis = 0, requires the presence of the same column name, the name of the same column, the upper stitching line, the number of rows = number of rows df1 rows + df2

print('行连接')
df3 = concat([df1,df2])
print(df3)

Here Insert Picture Description

Line connecting line and to develop an index

When the line is connected, you can specify index

print('行连接并制定行索引')
df3 = concat([df1,df2],keys=['a','b'])
print(df3)

Here Insert Picture Description

Deduplication

Will be connected to the same line after line information, drop_duplicates can delete duplicate rows

print("去重")
df3 = concat([df1,df2],ignore_index=True).drop_duplicates()
print(df3)

Here Insert Picture Description

Published 22 original articles · won praise 0 · Views 4420

Guess you like

Origin blog.csdn.net/Yolo_C/article/details/104118101