python 实现多表组合

A库中a表字段：

A库a表

原名称

序号

提交时间

微信OpenID

饮食方案评价

运动方案评价

管理师评价

现在名称 id inputTime wxOpenId eatProgram motionProgram mTEvaluation

注：a 表提交时间格式为：时间戳：需要转换为时间格式

B库中b表字段：

**B库b表**
原名称	微信openID	微信unionID	userID	总打卡率	总饮食打卡率	总运动打卡率
现在名称	wxOpenId	wxUnionID	userId	punchRate	eatPunchRate	motionRate

基于openID 进行a、b 表关联，整合后的c表字段：

原名称	提交日期	微信OpenID	userID	饮食方案评价	运动方案评价	管理师评价	总打卡率	总饮食打卡率	总运动打卡率
现在名称	inputTime	wxOpenId	userId	eatProgram	motionProgram	mTEvaluation	punchRate	eatPunchRate	motionRate

注意事项：
1、数据迁移当中需要规范字段名称，统一使用英文字段，降低因编码格式出现BUG的几率
2、数据需要针对数据类型，提前进行数据类型规范；对于多数据源汇总，需要提前敲定数据类型
3、确定关联主键（唯一标识），考虑因为关联产生的一对多问题，空值问题，异常值问题

# -*- coding: utf-8 -*-

# 导入必要模块
import pandas as pd
from sqlalchemy import create_engine

# 初始化数据库连接，使用pymysql模块
# MySQL的用户：***, 密码:****, 端口：3306,数据库：****
engineA = create_engine('mysql+pymysql://cyt123:[email protected]:3306/dateName')

# 查询语句，选出 wx_A 表中的所有数据
wx_a_sql = '''
      select id,inputTime,wxOpenId,eatProgram,motionProgram,mTEvaluation from wx_A;
      '''

# read_sql_query的两个参数: sql语句， 数据库连接
df = pd.read_sql_query(wx_a_sql, engineA)

# 输出wx_A表的查询结果
print('Read from and write to Mysql table successfully! 【wx_A】')
print(df)

# 查询语句，选出 other_B 表中的所有数据
wx_other_sql = '''
      select wxOpenId,wxUnionID,userId,punchRate,eatPunchRate,motionRate from other_B;
      '''

# read_sql_query的两个参数: sql语句， 数据库连接
dfb = pd.read_sql_query(wx_other_sql, engineA)

# 输出 other_B 表的查询结果
print('Read from and write to Mysql table successfully! 【other_B】')
print(dfb)

print(pd.merge(left=df, right=dfb, how='left', left_on='wxOpenId', right_on='wxOpenId'))
print('Read from and write to Mysql table successfully! 【wx_A left join other_B on wxOpenId 】')

实现结果

python 实现多表组合

猜你喜欢