在日常做数据分析时,常常会需要数据的I/O,由于本人常从数据库提取数据,然后在Python中做分析,有时也会将分析结果导入数据库以供他人查询。
这个过程如果都是通过导表来做,手动处理的步骤就太多了,很不方便。所幸,Python有许多包提供了数据库连接功能,能够进行Python与数据库的数据读写交互。
之前在Python2.7中,我常用MySQLdb这个包,利用该包可以达到数据的读写,但有一点,不能将整个的DataFrame上传到数据库中。后来看别人用pymysql来整体上传DataFrame,遂自己尝试使用,但跳进了一个坑。。。
因为我用的是2.7,所以使用使,死活不行,一直报错,后来才发现在Python3.6中就行了。。。
于是,在这里做个笔记,把使用方法分享出来,也防止自己忘了这个血的教训。
废话少说,直接上码:
import pymysql
import pandas as pd
import pandas.io.sql as iosql
import time
from sqlalchemy import create_engine
strnowtime_md = str(time.strftime("%m%d"))
# 连接数据库,设置中文字符格式
db = pymysql.connect(host="你的数据库主机地址",port=端口,user="你的数据库用户名",passwd="你的数据库密码",db="所要连接的数据库名称")
db.set_charset('utf8')
cursor = db.cursor() # 获取游标
df = pd.read_csv("C://deal_predict//feature_test_"+strnowtime_md+"//data_all_"+strnowtime_md+"_test.csv",encoding = "utf-8")
yconnect = create_engine('mysql+pymysql://用户名:密码@主机地址:端口/数据库名称?charset=utf8')
iosql.to_sql(df,'test_temp', yconnect, schema='analysis', index=False)
sql_deletefrom_lastupdate = """ DELETE FROM custom_purchase_probability_lastupdate """
sql_insert_to_lastupdate = """ INSERT INTO `custom_purchase_probability_lastupdate`
select * from custom_purchase_probability """
sql_deletefrom_new = """DELETE FROM custom_purchase_probability"""
sql_insert_to_new = """ INSERT INTO `custom_purchase_probability`
(sell_custom_archives_id,openid,building_project_id,archives_time,archives_group,probability,agent_operator_id,classify_group)
SELECT sell_custom_archives_id,openid,building_project_id,archives_time,archives_group,probability,agent_operator_id,classify_group
FROM test_temp """
sql_drop_test_temp = """ DROP TABLE test_temp """
try:
cursor.execute(sql_deletefrom_lastupdate)
db.commit()
except:
print("Error1")
try:
cursor.execute(sql_insert_to_lastupdate)
db.commit()
except:
print("Error2")
try:
cursor.execute(sql_deletefrom_new)
db.commit()
except:
print("Error3")
try:
cursor.execute(sql_insert_to_new)
db.commit()
except:
print("Error4")
try:
cursor.execute(sql_drop_test_temp)
db.commit()
except:
print("Error5")
你的数据库主机地址",port=端口,user="你的数据库用户名",passwd="你的数据库密码",db="所要连接的数据库名称")
db.set_charset('utf8')
cursor = db.cursor() # 获取游标
df = pd.read_csv("C://deal_predict//feature_test_"+strnowtime_md+"//data_all_"+strnowtime_md+"_test.csv",encoding = "utf-8")
yconnect = create_engine('mysql+pymysql://用户名:密码@主机地址:端口/数据库名称?charset=utf8')
iosql.to_sql(df,'test_temp', yconnect, schema='analysis', index=False)
sql_deletefrom_lastupdate = """ DELETE FROM custom_purchase_probability_lastupdate """
sql_insert_to_lastupdate = """ INSERT INTO `custom_purchase_probability_lastupdate`
select * from custom_purchase_probability """
sql_deletefrom_new = """DELETE FROM custom_purchase_probability"""
sql_insert_to_new = """ INSERT INTO `custom_purchase_probability`
(sell_custom_archives_id,openid,building_project_id,archives_time,archives_group,probability,agent_operator_id,classify_group)
SELECT sell_custom_archives_id,openid,building_project_id,archives_time,archives_group,probability,agent_operator_id,classify_group
FROM test_temp """
sql_drop_test_temp = """ DROP TABLE test_temp """
try:
cursor.execute(sql_deletefrom_lastupdate)
db.commit()
except:
print("Error1")
try:
cursor.execute(sql_insert_to_lastupdate)
db.commit()
except:
print("Error2")
try:
cursor.execute(sql_deletefrom_new)
db.commit()
except:
print("Error3")
try:
cursor.execute(sql_insert_to_new)
db.commit()
except:
print("Error4")
try:
cursor.execute(sql_drop_test_temp)
db.commit()
except:
print("Error5")
上述代码中,把相应的数据库连接参数换成你自己的即可。
这里实现了:
- 从csv文件中读取数据,存为DataFrame;
- 利用 iosql.to_sql(df,'test_temp', yconnect, schema='analysis', index=False) 将DataFrame上传到数据库,存为一张临时表;
- 然后利用db.cursor()游标,分别执行了数据库的清除数据、插入数据、删除表等操作。
原则上,在Python脚本中,是可以实现Mysql所有操作的。区别就在于:Python脚本把SQL语句存为字符串,然后连接到数据库进行处理,多了一个步骤。
关于上述代码中to_sql的用法,这里贴出Python源码中各个参数的解释:
def to_sql(frame, name, con, flavor=None, schema=None, if_exists='fail',
index=True, index_label=None, chunksize=None, dtype=None):
"""
Write records stored in a DataFrame to a SQL database.
Parameters
----------
frame : DataFrame
name : string
Name of SQL table.
con : SQLAlchemy connectable(engine/connection) or database string URI
or sqlite3 DBAPI2 connection
Using SQLAlchemy makes it possible to use any DB supported by that
library.
If a DBAPI2 object, only sqlite3 is supported.
flavor : 'sqlite', default None
.. deprecated:: 0.19.0
'sqlite' is the only supported option if SQLAlchemy is not
used.
schema : string, default None
Name of SQL schema in database to write to (if database flavor
supports this). If None, use default schema (default).
if_exists : {'fail', 'replace', 'append'}, default 'fail'
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
index : boolean, default True
Write DataFrame index as a column.
index_label : string or sequence, default None
Column label for index column(s). If None is given (default) and
`index` is True, then the index names are used.
A sequence should be given if the DataFrame uses MultiIndex.
chunksize : int, default None
If not None, then rows will be written in batches of this size at a
time. If None, all rows will be written at once.
dtype : single SQLtype or dict of column name to SQL type, default None
Optional specifying the datatype for columns. The SQL type should
be a SQLAlchemy type, or a string for sqlite3 fallback connection.
If all columns are of the same type, one single value can be used.
"""