Python与MySQL的交互：数据的读写

在日常做数据分析时，常常会需要数据的I/O，由于本人常从数据库提取数据，然后在Python中做分析，有时也会将分析结果导入数据库以供他人查询。

这个过程如果都是通过导表来做，手动处理的步骤就太多了，很不方便。所幸，Python有许多包提供了数据库连接功能，能够进行Python与数据库的数据读写交互。

之前在Python2.7中，我常用MySQLdb这个包，利用该包可以达到数据的读写，但有一点，不能将整个的DataFrame上传到数据库中。后来看别人用pymysql来整体上传DataFrame，遂自己尝试使用，但跳进了一个坑。。。

因为我用的是2.7，所以使用使，死活不行，一直报错，后来才发现在Python3.6中就行了。。。

于是，在这里做个笔记，把使用方法分享出来，也防止自己忘了这个血的教训。

废话少说，直接上码：

import pymysql
import pandas as pd
import pandas.io.sql as iosql
import time
from sqlalchemy import create_engine

strnowtime_md = str(time.strftime("%m%d"))

# 连接数据库，设置中文字符格式
db = pymysql.connect(host="你的数据库主机地址",port=端口,user="你的数据库用户名",passwd="你的数据库密码",db="所要连接的数据库名称")
db.set_charset('utf8')
cursor = db.cursor() # 获取游标
df = pd.read_csv("C://deal_predict//feature_test_"+strnowtime_md+"//data_all_"+strnowtime_md+"_test.csv",encoding = "utf-8")

yconnect = create_engine('mysql+pymysql://用户名:密码@主机地址:端口/数据库名称?charset=utf8')
iosql.to_sql(df,'test_temp', yconnect, schema='analysis', index=False)

sql_deletefrom_lastupdate  = """ DELETE FROM custom_purchase_probability_lastupdate """
sql_insert_to_lastupdate = """ INSERT INTO `custom_purchase_probability_lastupdate`
select * from custom_purchase_probability """

sql_deletefrom_new = """DELETE FROM custom_purchase_probability"""
sql_insert_to_new = """ INSERT INTO `custom_purchase_probability`
(sell_custom_archives_id,openid,building_project_id,archives_time,archives_group,probability,agent_operator_id,classify_group)
SELECT sell_custom_archives_id,openid,building_project_id,archives_time,archives_group,probability,agent_operator_id,classify_group
FROM test_temp  """

sql_drop_test_temp = """ DROP TABLE test_temp """

try:
    cursor.execute(sql_deletefrom_lastupdate)
    db.commit()
except:
    print("Error1")
try:
    cursor.execute(sql_insert_to_lastupdate)
    db.commit()
except:
    print("Error2")
try:
    cursor.execute(sql_deletefrom_new)
    db.commit()
except:
    print("Error3")
try:
    cursor.execute(sql_insert_to_new)
    db.commit()
except:
    print("Error4")
try:
    cursor.execute(sql_drop_test_temp)
    db.commit()
except:
    print("Error5")你的数据库主机地址",port=端口,user="你的数据库用户名",passwd="你的数据库密码",db="所要连接的数据库名称")
db.set_charset('utf8')
cursor = db.cursor() # 获取游标
df = pd.read_csv("C://deal_predict//feature_test_"+strnowtime_md+"//data_all_"+strnowtime_md+"_test.csv",encoding = "utf-8")

yconnect = create_engine('mysql+pymysql://用户名:密码@主机地址:端口/数据库名称?charset=utf8')
iosql.to_sql(df,'test_temp', yconnect, schema='analysis', index=False)

sql_deletefrom_lastupdate  = """ DELETE FROM custom_purchase_probability_lastupdate """
sql_insert_to_lastupdate = """ INSERT INTO `custom_purchase_probability_lastupdate`
select * from custom_purchase_probability """

sql_deletefrom_new = """DELETE FROM custom_purchase_probability"""
sql_insert_to_new = """ INSERT INTO `custom_purchase_probability`
(sell_custom_archives_id,openid,building_project_id,archives_time,archives_group,probability,agent_operator_id,classify_group)
SELECT sell_custom_archives_id,openid,building_project_id,archives_time,archives_group,probability,agent_operator_id,classify_group
FROM test_temp  """

sql_drop_test_temp = """ DROP TABLE test_temp """

try:
    cursor.execute(sql_deletefrom_lastupdate)
    db.commit()
except:
    print("Error1")
try:
    cursor.execute(sql_insert_to_lastupdate)
    db.commit()
except:
    print("Error2")
try:
    cursor.execute(sql_deletefrom_new)
    db.commit()
except:
    print("Error3")
try:
    cursor.execute(sql_insert_to_new)
    db.commit()
except:
    print("Error4")
try:
    cursor.execute(sql_drop_test_temp)
    db.commit()
except:
    print("Error5")

上述代码中，把相应的数据库连接参数换成你自己的即可。

这里实现了：

从csv文件中读取数据，存为DataFrame；
利用 iosql.to_sql(df,'test_temp', yconnect, schema='analysis', index=False) 将DataFrame上传到数据库，存为一张临时表；
然后利用db.cursor()游标，分别执行了数据库的清除数据、插入数据、删除表等操作。

原则上，在Python脚本中，是可以实现Mysql所有操作的。区别就在于：Python脚本把SQL语句存为字符串，然后连接到数据库进行处理，多了一个步骤。

关于上述代码中to_sql的用法，这里贴出Python源码中各个参数的解释：

def to_sql(frame, name, con, flavor=None, schema=None, if_exists='fail',
           index=True, index_label=None, chunksize=None, dtype=None):
    """
    Write records stored in a DataFrame to a SQL database.

    Parameters
    ----------
    frame : DataFrame
    name : string
        Name of SQL table.
    con : SQLAlchemy connectable(engine/connection) or database string URI
        or sqlite3 DBAPI2 connection
        Using SQLAlchemy makes it possible to use any DB supported by that
        library.
        If a DBAPI2 object, only sqlite3 is supported.
    flavor : 'sqlite', default None
        .. deprecated:: 0.19.0
           'sqlite' is the only supported option if SQLAlchemy is not
           used.
    schema : string, default None
        Name of SQL schema in database to write to (if database flavor
        supports this). If None, use default schema (default).
    if_exists : {'fail', 'replace', 'append'}, default 'fail'
        - fail: If table exists, do nothing.
        - replace: If table exists, drop it, recreate it, and insert data.
        - append: If table exists, insert data. Create if does not exist.
    index : boolean, default True
        Write DataFrame index as a column.
    index_label : string or sequence, default None
        Column label for index column(s). If None is given (default) and
        `index` is True, then the index names are used.
        A sequence should be given if the DataFrame uses MultiIndex.
    chunksize : int, default None
        If not None, then rows will be written in batches of this size at a
        time.  If None, all rows will be written at once.
    dtype : single SQLtype or dict of column name to SQL type, default None
        Optional specifying the datatype for columns. The SQL type should
        be a SQLAlchemy type, or a string for sqlite3 fallback connection.
        If all columns are of the same type, one single value can be used.

    """

Python与MySQL的交互：数据的读写

猜你喜欢