Python bulk delete large amounts of data in mysql ten million

Scene Description

Online mysqldatabase which has stored the results of statistical tables every day, every day there are more than 10 million pieces of which you'd never imagined, why there are so many statistics. Looking over operation and maintenance, disk accounted for 200G, and finally asked the operator can keep only the last three days, the previous data can only be deleted. Delete, how to delete?
Because this is an online database, there are a lot of other data stored inside the table, this table if you delete data directly, certainly not, other tables might be affected. Every day just try to delete data, or worse Caton, no way to write a Python script to bulk delete it.
Specific ideas are:

  1. Every day only delete data;
  2. Delete day's data, each deletion 50000;
  3. End of day data deletion, the next day to start deleting data;

Python code

# -*-coding:utf-8 -*-

import sys

# 这是我们内部封装的Python Module
sys.path.append('/var/lib/hadoop-hdfs/scripts/python_module2')
import keguang.commons as commons
import keguang.timedef as timedef
import keguang.sql.mysqlclient as mysql


def run(starttime, endtime, regx):
    tb_name = 'statistic_ad_image_final_count'
    days = timedef.getDays(starttime,endtime,regx)
    # 遍历删除所有天的数据
    for day in days:
        print '%s 数据删除开始'%(day)
        mclient = getConn()
        sql = '''
        select 1 from %s where date = '%s' limit 1
        '''%(tb_name, day)
        print sql
        result = mclient.query(sql)
        # 如果查询到了这一天的数据,继续删除
        while result is not ():
            sql = 'delete from %s where date = "%s" limit 50000'%(tb_name, day)
            print sql
            mclient.execute(sql)
            sql = '''
            select 1 from %s where date = '%s' limit 1
            '''%(tb_name, day)
            print sql
            result = mclient.query(sql)
        print '%s 数据删除完成'%(day)
        mclient.close()


# 返回mysql 连接
def getConn():
    return mysql.MysqlClient(host = '0.0.0.0', user = 'test', passwd = 'test', db= 'statistic')

if __name__ == '__main__':
    regx = '%Y-%m-%d'
    yesday = timedef.getYes(regx, -1)
    starttime = '2019-08-17'
    endtime ='2019-08-30'
    run(starttime, endtime, regx)

Cycle data to determine if there is, continue to remove the day 50000of data; otherwise, the next day to start deleting data. It took half an hour, and finally delete the finished.


I welcome attention to the micro-channel public number

Guess you like

Origin www.cnblogs.com/data-magnifier/p/11516132.html