Optimization of Mysql deleting a large amount of data in a large table

Suppose there is a table with 30 million records, and all the records with status=1 need to be deleted without stopping the business, there are almost 6 million records.

If you directly use delete from tab_name where status=1; it will trigger the error of lock wait timeout exceed, because this statement involves too many records.

1. The difference between drop, truncate and delete:

drop

truncate

delete

Implementation process

DDL statement, delete the entire table and table structure, as well as table indexes, constraints and triggers.

DDL statements only delete table data, and the table structure, indexes, constraints, etc. will be preserved.

DML statement, delete data in the table

rollback

can't

can't

Can

affairs

No transactions, no table locks, and no large amount of logs written into log files;

Release disk space immediately after truncate table table_name and reset the value of auto_increment.

The transaction will be recorded to the log, and there are row and table locks;

delete does not release disk space, but subsequent inserts will overwrite previously deleted data.

Execution efficiency: drop > truncate > delete

2. Batch limit deletion scheme:

DELETE FROM tab_name WHERE status=1 ORDER BY status LIMIT 10000;

Note: When you need to use order by sorting, you must use order by + limit together, otherwise order by will be optimized by the optimizer and considered meaningless.

Note: If the where statement of delete is not on the index, you can find the primary key first, and then delete the database according to the primary key.

1) Advantages of adding limit:

  1. Reduce the cost of writing wrong SQL, even if you delete the wrong one, such as limit 500, then you will lose 500 pieces of data, which is not fatal, and the data can be recovered quickly through binlog.
  2. To avoid long transactions, MySQL will add write locks and gap locks (gap locks) to all the rows involved when delete is executed, and all rows related to the execution of DML statements will be locked. If the number of deletions is large, it will directly affect the unusability of related businesses.
  3. When the amount of delete data is large, it is easy to fill up the CPU without adding the limit, resulting in slower deletion.

For the second point above, the premise is that an index is added to the statusid. As we all know, locking is based on the index. If the statusid field is not indexed, it will be scanned to the primary key index. Even if there is only one record with statusid = 1, It will also lock the table.

2) It is definitely a good habit to use limit1 for single delete and update operations:

For a single update and delete operation, if there is limit 1 in the SQL; then return is required, otherwise a full table scan will be performed before return. Efficiency speaks for itself.

 3. Rename scheme:

A table has 160 million data and an auto-increment ID. The maximum value is 160 million. It is necessary to delete data larger than 2.5 million. Is there any way to quickly delete it?

See the mysql docs for a solution: http://dev.mysql.com/doc/refman/5.0/en/delete.html

When deleting multiple rows of data in a large table, the limit of the innod block table size will be exceeded. The solution to minimize the time for locking the table is:

1) Select the data that does not need to be deleted, and store them in an empty table with the same structure

INSERT INTO t_copy SELECT * FROM t WHERE ... ;

2) Use the rename atomic operation to rename the original table and the copy table

RENAME TABLE t TO t_old, t_copy TO t;

3) Delete the original table

DROP TABLE t_old;

4. Rebuild after deleting unnecessary indexes

In the use of My SQL database, some tables store a relatively large amount of data, reaching about 3 million records per day. Three indexes are established in this table. These indexes are necessary and must be used by other programs. Since the data in this table is only required to keep the data of the current day, whenever other programs have processed the data in the table at a certain moment in the morning, you need to delete the data of yesterday and the previous day in the table, and use delete to delete hundreds of data in the table. When there are 10,000 records, the deletion speed of MySQL is very slow. It takes about 4 minutes for every 10,000 records. It takes more than eight hours to delete all useless data, which is unacceptable.

Query the official MySQL manual to know that the speed of deleting data is directly proportional to the number of indexes created (for DML operations, if there is an index, the index information will be updated, so it will be slower), so I deleted two of the indexes and tested and found At this time, the deletion speed is quite fast, and it takes more than a minute for 1 million records. However, these two index modules are still used in data sorting once a day, so I thought of a compromise method:

  1. Delete these two indexes before deleting the data, which takes a little more than three minutes;
  2. Then delete the useless data, this process takes less than two minutes;
  3. Re-create the index after the deletion is complete, because the data in the database is relatively small at this time, about 300,000 to 400,000 records (the data in this table will increase by about 100,000 records per hour), and the index creation is also very fast, about ten minutes about. This entire deletion process only takes about 15 minutes. Compared with the previous eight hours, it saves a lot of time.

Guess you like

Origin blog.csdn.net/liuxiao723846/article/details/130360635