Python operation Mysql appends and writes Dataframe to the database (Python/Mysql/deduplication update)

This article mainly implements the use of Python to deduplicate, append, and write update data to Mysql data according to the primary key. The data source is Dataframe type, which can be used in scenarios such as crawlers that need to deduplicate, append, and update data.

This article refers to the code written in Mysql on the Internet and makes the following optimizations:

  1. No need to specify column names that need to be written
  2. Variable names are enclosed in `` to solve errors caused by some special variable names
  3. Wrap the entire additional writing process into an external tool class, and only need a simple reference to solve the problem of data additional writing
#!/usr/bin/env python
# -*- coding:utf-8 -*-
# 读取本地 excel 数据集到 Dataframe 中, 调用数据库工具类 DBUtils 中的insert_data 追加写入数据到数据库中

import os
import pymysql
import pandas as pd
from DBUtils import DBUtils

# 读取数据集
file_path = os.getcwd() + '/'
data = pd.read_excel(file_path + "data.xlsx")
data.fillna("", inplace=True) # 替换NaN,否则数据写入时会报错,也可替换成其他

# 连接数据库,定义变量
db = pymysql.connect(host='192.168.1.1', user='root', password='1234', port=3306, db='database_name')
cursor = db.cursor()
table = "table_name" # 写入表名

# 写入数据
DBUtils.insert_data(DBUtils(db, cursor, data, table))
#!/usr/bin/env python
# -*- coding:utf-8 -*-

class DBUtils:
    """
    数据库工具类
    """

    """:param
    db:     数据库连接:  db = pymysql.connect(host='192.168.1.1', user='root', password='1234', port=3306, db='database_name')
    cursor: 数据库游标:  cursor = db.cursor()
    data:   需写入数据:  Dataframe
    table:  写入表名    
    """

    def __init__(self, db, cursor, data, table):
        self.db = db
        self.cursor = cursor
        self.data = data
        self.table = table

    # 按主键去重追加更新
    def insert_data(self):
        keys = ', '.join('`' + self.data.keys() + '`')
        values = ', '.join(['%s'] * len(self.data.columns))
        # 根据表的唯一主键去重追加更新
        sql = 'INSERT INTO {table}({keys}) VALUES ({values}) ON DUPLICATE KEY UPDATE'.format(table=self.table,
                                                                                             keys=keys,
                                                                                             values=values)
        update = ','.join(["`{key}` = %s".format(key=key) for key in self.data])
        sql += update

        for i in range(len(self.data)):
            try:
                self.cursor.execute(sql, tuple(self.data.loc[i]) * 2)
                print('正在写入第%d条数据' % (i + 1))
                self.db.commit()
            except Exception as e:
                print("数据写入失败,原因为:" + e)
                self.db.rollback()

        self.cursor.close()
        self.db.close()
        print('数据已全部写入完成!')

Note: The writing speed of this writing method is relatively slow, and it is not recommended for scenarios with a large amount of data.

Guess you like

Origin blog.csdn.net/lzykevin/article/details/121378418