How MySQL cleans up data and frees up disk space

In our production environment, there is a table: courier_consume_fail_message, which stores the data of failed message consumption. At the beginning of the design, the data volume of this table was estimated to be below the 10,000 level, so no index was established.

However, it is found that the amount of data in this table has reached the million level, and the reason is that a large amount of retry consumption has been generated, which leads to slow query of this table.

Therefore, the table data needs to be cleaned up. In fact, after deleting data using the DELETE command, we found that the query speed did not increase significantly, and may even decrease. Why?

Because the DELETE command only marks the row of data as "deleted", it does not immediately release the storage space occupied by the row of data on the disk, which will lead to a large number of fragments in the data file, thereby affecting query performance. Therefore, in addition to deleting table records, it is also necessary to clean up disk fragments.

Before table fragmentation, we focus on the following four indicators.

  • Indicator 1: The state of the table:SHOW TABLE STATUS LIKE 'courier_consume_fail_message';
  • Indicator 2: The actual number of rows in the table:SELECT count(*) FROM courier_consume_fail_message;
  • Indicator three: the number of lines to be cleaned:SELECT count(*) FROM courier_consume_fail_message where created_at < '2023-04-19 00:00:00';
  • Indicator 4: Execution plan of table query:EXPLAIN SELECT * FROM courier_consume_fail_message WHERE service='courier-transfer-mq';
 -- 清理磁盘碎片
 OPTIMIZE TABLE courier_consume_fail_message;

The following is a comparison of indicators before and after cleaning.

1. Before cleaning

Indicator one, the state of the table:

img

Indicator 2, the actual number of rows in the table: 76986

Indicator 3, the number of rows to be cleaned: 76813

Indicator four, the execution plan of the table query:

2. Clean up the data

The following are DELETE FROM courier_consume_fail_message WHERE created_at < '2023-04-19 00:00:00';the statistics after execution.

Indicator one, the state of the table:

img

Indicator 2, the actual number of rows in the table: 173

Indicator three, the number of rows to be cleaned: 0

Indicator four, the execution plan of the table query:

img

From indicator 4, we can see that after clearing the table records, the number of rows scanned by the query remains unchanged: 8651048.

3. Clean up debris

The following are OPTIMIZE TABLE courier_consume_fail_message;the statistics after execution.

Indicator one, the state of the table:

img

Indicator four, the execution plan of the table query:

img

From indicator 4, we can see that after cleaning the table records, the number of rows scanned by the query becomes 100.

summary

It can be seen that the number of data rows and data length of the table have been cleaned up, and the number of rows scanned by the query statement has also been reduced.

In order to improve SELECT * FROM courier_consume_fail_message WHERE service='courier-transfer-mq';the query efficiency of the statement, an index should still be established.

 alter` `table` `ec_courier.courier_consume_fail_message ``add` `index` `idx_service(service);

the cover

related articles

Maybe you are also interested in the following article.

Guess you like

Origin juejin.im/post/7254474251725373495