How does MySQL delete large tables gracefully?


With the passage of time or the growth of business volume, the database space utilization rate has also been steadily increasing. When the database space is about to reach the bottleneck, we may find that the database has one or two super large tables! They have accumulated all the data from the beginning of the business to the present, but 90% of the data has no business value. How to deal with these large tables at this time?

Since it is worthless data, we usually choose to delete it directly or delete it after archiving. The operation method of data deletion can be divided into two categories:

  • Directly delete all data in the table through truncate
  • Delete the records that meet the conditions in the table through delete

One, Truncate operation

In a logical sense, the truncate operation is to delete all rows in the table, but it is not the same as delete from table_name wehre 1=1. In order to improve the performance of deleting the entire table data, the truncate operation is actually to drop the table first and then re-create the table. Because of this, the truncate operation is a non-rollback DDL operation.

1.1 What does MySQL truncate do?

  • The truncate operation is actually divided into two steps: drop and re-create
  • The first stage of the drop operation is the process of clearing the pages of the Buffer pool, and the data pages related to the table are deleted from the flush chain without the need to perform a flush operation. The bottleneck of this step is mainly that the delete operation of the flush queue must hold the lock of the corresponding buffer pool instance and perform a traversal search. If the buffer pool instance is relatively large and there are many data pages that need to be deleted in the flush chain, this operation will cause other transactions to be deleted. Obtaining the lock of the buffer pool instance is blocked, which affects the performance of the database
  • The second stage of the drop operation is the process of deleting ibd disk files. The larger the physical file of the deleted database, the greater the consumption of I/O resources, and the longer the deletion operation takes.
  • In the re-create operation stage, as long as the .frm file of the deleted table is intact, it can be rebuilt according to the original table structure information after the drop table, and the auto_increment value of the table will be reset after the rebuild

1.2 How to optimize the resource consumption caused by truncate operation?

  • For the first stage of the drop table in the truncate operation, when the innodb_buffer_pool_size allocated to the MySQL instance exceeds 1GB, set the innodb_buffer_pool_instances parameter reasonably to improve concurrency and reduce the time-consuming lock resource occupation when scanning the buffer pool instance in disguise.
  • For the second stage of the drop table in the truncate operation, before deleting the corresponding table, create a hard link to the .ibd file of the modified table to speed up the execution efficiency of the drop operation at the MySQL level and reduce the performance loss on the database level. Follow-up to manually clean up the hard connections we made at the operating system level

Two, Delete operation

2.1 What operations does MySQL delete do?

  • Perform an index/full table scan on the deleted table according to the where condition to check whether the where condition is met. This stage will lock all rows in the scan. This stage is the biggest hidden danger of resource consumption. If the amount of data in the table is large and the delete operation cannot effectively use the index to reduce the amount of scanned data, this step will bring huge lock contention and cpu/io resource consumption to the database.
  • Locks imposed on rows that cannot be matched by the where condition will be released after the condition is checked, and InnoDB only locks the rows that need to be deleted. This can effectively reduce lock contention, but one thing we still need to pay attention to is that deleting a large amount of data at one time will generate a huge binlog transaction log, which is both for MySQL itself and the slave libraries in the master-slave architecture. It is unfriendly, and it may cause replication delays.

2.2 How to optimize the delete operation?

  • Delete the entire table delete operation needs to be cautious, consider using truncate operation
  • In delete… where …, where filter conditions try to ensure that indexes can be effectively used to reduce the amount of data scans and avoid full table scans
  • For large-scale data deletion and where conditions are not indexed, the delete operation can add an additional self-growing primary key or a time field with an index, and perform batch deletion operations. A small amount of data is deleted each time and executed in multiple batches.
  • For the classic scenario of retaining recent data and deleting historical data, you can create the same structure of the xxx_tmp table and use the insert xxx_tmp select… operation to retain the required data in the tmp table, and then use the rename operation to replace the current business table xxx with the xxx_bak table, xxx_tmp Replace the table with the current business table name xxx, and then manually delete the useless large table xxx_bak

2.3 Two common scenarios for delete

2.3.1 delete where condition has no valid index filtering

A more common scenario is that the value of t1 condition1=xxx needs to be deleted in the business, and the condition field cannot effectively use the index. In this case, our usual practice is:

  • View the indexes that can be used effectively in the current table structure, try to be the self-growing primary key or time index field of the table
  • Effectively use the self-growing primary key index or time index, add the delete operation to the range filter of the index field, delete a small amount of data each time, and execute it in multiple batches. Specific batches need to be evaluated based on actual business conditions to avoid deleting large batches of data at once.
-- 利用自增长主键索引
delete from t1 where condition1=xxx and id >=1 and id < 50000;
delete from t1 where condition1=xxx and id >=50000 and id < 100000;


-- 利用时间索引
delete from t1 where condition1=xxx and create_time >= '2021-01-01 00:00:00' and create_time < '2021-02-01 00:00:00';
delete from t1 where condition1=xxx and create_time >= '2021-02-01 00:00:00' and create_time < '2021-03-01 00:00:00';

2.3.2 Keep recent data and delete historical data

One of the more common scenarios is to keep only the last 3 months of data in the t1 table, and delete the rest of the historical data. Our usual approach is:

  • Create a t1_tmp table to temporarily store the data that needs to be retained
create table t1_tmp like t1;
  • According to the indexed time field, write the data that needs to be retained into the t1_tmp table in batches. It should be noted that the operation of the last batch of time can not be processed temporarily
-- 根据实例业务数量进行分批,尽量每批次处理数据量不要太大
insert into t1_tmp select * from t1 where create_time >= '2021-01-01 00:00:00' and create_time < '2021-02-01 00:00:00';
insert into t1_tmp select * from t1 where create_time >= '2021-02-01 00:00:00' and create_time < '2021-03-01 00:00:00';

-- 当前最后一批次数据先不操作
-- insert into t1_tmp select * from t1 where create_time >= '2021-03-01 00:00:00' and create_time < '2021-04-01 00:00:00';
  • Use the rename operation to replace the current business table t1 with the t1_bak table, and replace the t1_tmp table with the current business table name t1. If the deleted table has frequent DML operations, this step will cause temporary business access failures
alter table t1 rename to t1_bak;
alter table t1_tmp rename to t1;
  • Write the last batch of data into the current business table. The purpose of this step is to reduce data loss in the change operation process
insert into t1 select * from t1_bak where create_time >= '2021-03-01 00:00:00' and create_time < '2021-04-01 00:00:00';
  • In the rename operation step, one more thing we need to pay attention to is whether the primary key of the change table is self-increasing or the only uuid of the business. If it is a self-increasing primary key, we also need to pay attention to modifying the self-increasing value of the t1_tmp table to ensure that the final set value contains Data writing during change
alter table t1_tmp auto_increment={t1表当前auto值}+{变更期间预估增长值}

3. Comparison of advantages and disadvantages of Truncate/Delete

Operation type description Advantage Disadvantage
Truncate Table full delete operation No need to scan table data, high execution efficiency, direct physical deletion, quick release of space occupied DDL operations cannot be rolled back and cannot be deleted according to conditions
Delete Filter and delete operations based on specified conditions Can be filtered and deleted according to specified conditions The deletion efficiency depends on the writing of the where condition. The deletion of a large table will produce a large number of binlogs and the deletion efficiency is low. The deletion operation may have more fragmented space instead of directly releasing the space occupied.

Guess you like

Origin blog.csdn.net/weixin_37692493/article/details/115283520