pandas多表操作
1.Inner Merge 合并dataframe
pd.merge()将两张dataframe合成一张
除了pandas的方法,each DataFrame都有自己的merge()方法
查询 类似于SELECT WHERE
results = all_data[all_data.revenue > all_data.target]
#1
sales_vs_targets = pd.merge(sales, targets)
crushing_it = sales_vs_targets[sales_vs_targets.revenue > sales_vs_targets.target]
print(crushing_it)
#2
new_df = orders.merge(customers)
big_df = orders.merge(customers).merge(products)
2.合并的时候碰到两个表之间有一样命名的列如id ,但不同意思的话 如order和products 都有id 但是意思不同 这时候有2种方法
(1).重命名rename
orders_products = pd.merge(
orders,
products.rename(columns={'id': 'product_id'}))
(2).left_on right_on
left_on right_on表示两张表中实际等价的列
suffixes=['_orders', '_products']) 表示id_x ,id_y中的x和y的值
orders_products=pd.merge(
orders,
products,
left_on ='product_id',
right_on='id',
suffixes=['_orders', '_products'])
print(orders_products)
Mismatched Merges 也就是表1中有的数据表二没有相对应的行会导致数据丢失,不匹配的行不会出现
3.Outer Merge
使用Outer Merge不会出现数据丢失
store_a_b_outer = pd.merge(store_a,store_b,how='outer')
3.Left and Right Merge
左联合就是左边的数据一定不会丢失 右边可能会
右联合一样 将右联合反过来就是左联合
import codecademylib
import pandas as pd
store_a = pd.read_csv('store_a.csv')
print(store_a)
store_b = pd.read_csv('store_b.csv')
print(store_b)
store_a_b_left = pd.merge(store_a, store_b, how='left')
store_b_a_left = pd.merge(store_b, store_a, how='left')
print(store_a_b_left)
print(store_b_a_left)
4.表的连接
如果两张表一样 可以连在一起
import codecademylib
import pandas as pd
bakery = pd.read_csv('bakery.csv')
print(bakery)
ice_cream = pd.read_csv('ice_cream.csv')
print(ice_cream)
# Concatenate the two menus to form a new menu
menu = pd.concat([bakery, ice_cream])
print(menu)