Use a case to analyze the root cause of the performance degradation of MySQL 5.7 partition table

Foreword: I hope that through this article, users of MySQL 5.7.18 will be aware of the pitfalls in the use of partition tables and avoid continuing to step on this version. At the same time, through sharing the source code, the root cause of the performance degradation of the partition table when upgrading MySQL 5.7.18, to show MySQL source code lovers the use of locks in the partition table implementation.

Problem Description

In MySQL 5.7, there are many performance-related improvements. Including performance improvements related to temporary tables, optimization of connection establishment speed and performance improvements related to replication and distribution, etc. Basically, there is no need to make configuration changes, just upgrade to version 5.7, which can bring a lot of performance improvements.

We are testing the environment, upgrading the database to version 5.7.18, and verifying whether MySQL version 5.7.18 meets our expectations. Observation has been running for a period of time, and there are development feedbacks. The performance of the database is lower than that of the previous version 5.6.21. The main performance feature is to encounter more lock timeout situations. Another feedback from the development that the tables related to performance degradation are all partition tables. The updates are all the primary keys. This feedback caught our attention. We tried as follows:

  • The version of the database is 5.7.18. If the partition table is reserved, performance will decrease.
  • The database version is 5.7.18, the table is adjusted to a non-partitioned table, and the performance is normal.
  • Roll back the version of the database to version 5.6.21, retain the partition table, and the performance is normal

Through the above tests, we roughly determined that this performance degradation is related to the MySQL5.7 version upgrade.

Reproduce the problem

The database table structure of the test environment is more, and the calling relationship is also more complicated. In order to further analyze and locate the problem, we took the cocoon and constructed the following simple reproduction process

// 创建一个测试分区表t2: 
 CREATE TABLE `t2`( 
 
  `id` INT(11) NOT NULL, 
 
  `dt` DATETIME NOT NULL, 
 
  `data` VARCHAR(10) DEFAULT NULL, 
 
  PRIMARYKEY (`id`,`dt`), 
 
  KEY`idx_dt`(`dt`) 
 
) ENGINE=INNODB DEFAULTCHARSET=latin1 
 /*!50100 PARTITION BY RANGE (to_days(dt)) 
 
(PARTITION p20170218 VALUES LESS THAN (736744)ENGINE = InnoDB, 
 
 PARTITIONp20170219 VALUES LESS THAN (736745) ENGINE = InnoDB, 
 
 PARTITIONpMax VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */  
  
 
// 插入测试数据 
 INSERT INTO t2 VALUES (1, NOW(), '1'); 
 INSERT INTO t2 VALUES (2, NOW(), '2'); 
 INSERT INTO t2 VALUES (3, NOW(), '3');  
  
 
// SESSION 1 对id = 1的 记录 做一个更新操作,事务先不提交。 
 BEGIN;UPDATE t2 SET DATA = '12' WHERE id = 1;  
  
 
// SESSION 2 对id = 2 的记录做一个更新。  
 BEGIN;UPDATE t2 SET DATA = '21' WHERE id = 2;  

In SESSION 2, we found that this update operation has been waiting. ID is the primary key. Logically, the update of the record with the primary key id = 1 will not affect the update of the record with the primary key id = 2.

Query the innodb_locks table under information_schema. This table is used to record locks that InnoDB transactions try to apply for but have not yet acquired, and locks owned by transactions that block other transactions. There are two records:

Insert picture description here

Observe the innodb_locks table at this time, transaction id=40021 locks the second row record of page 3, resulting in transaction id=40022 unable to proceed.

If we roll back the database to version 5.6.21, the above scenario cannot be reproduced.

further analysis

According to the information provided by the innodb_locks table, we know that the problem is that InnoDB has locked inappropriate rows. This table is the memory storage engine. We set a breakpoint on the insert interface of the memory storage engine and get the following stack information. Determine the part of the red box, write the lock information into the innodb_locks table.

Insert picture description here

And it is confirmed in the function fill_innodb_locks_from_cache that every time the data is written to the row, it is obtained from the Cache object in the following code.

Insert picture description here

We know that the transaction lock information is stored in the Cache, so we need to further find out how the data in the Cache is added. Find the function add_lock_to_cache by searching where the cache object appears in the innodb code. After setting a breakpoint in this function for debugging, it is found that its content is consistent with the data filled in the innodb_locks table. Determine the lock object used by this function, which is the lock object we are looking for.

Insert picture description here

Troubleshoot where the lock_t type is used. After screening and debugging, it is found that in the function RecLock::lock_add, the row lock generated is added to the transaction linked list where the lock is located.

Insert picture description here

The RecLock::lock_add function can deduce the reason for row lock generation. Therefore, by setting a breakpoint on the function, view the function stack, and locate the function at the position of the red box in the following stack:

Insert picture description here

The following code of Partition_helper::handle_ordered_index_scan is tracked. According to the analysis of this code, m_part_spec.end_part determines the maximum number of rows to be locked. This is the reason for abnormal row lock generation.

Insert picture description here

In the end, the problem comes down to the generation of m_part_spec.end_part. By investigating the use of end_part, the initial setting value of the variable before use is finally located in the get_partition_set function. It can be seen from the code that each time a single record is updated, when the index scan is locked, the same number of rows in the partition table is locked. This is the root cause.

Insert picture description here

Verification conclusion

According to the previous analysis, each update operation of a single record will lock the same number of rows in the partition table. We try to verify our findings.

Add the following two records:

INSERT INTO t2 VALUES (4, NOW(), '4'); 
 INSERT INTO t2 VALUES (5, NOW(), '5');  
 
// SESSION 1 对id = 1的 记录 做一个更新操作,事务先不提交。 
 BEGIN;UPDATE t2 SET DATA = '12' WHERE id = 1; 
 
// SESSION 2 现在对id = 4 的记录做一个更新。  
 BEGIN;UPDATE t2 SET DATA = '44' WHERE id = 4;  

We found that the update to id = 4 can proceed normally. Will not be affected by updates with id = 1. This is because the record with id=4 exceeds the number of partitions in the test case and will not be locked. In practical applications, the number of partitions defined by the partition table will not be as many as 3 in the test case, but tens or even hundreds. The result of such locking will aggravate the lock conflict in the update situation, causing the transaction to be in a lock waiting state. As shown in the figure below, each transaction has N row locks, so the possibility of these locked records covering each other is greatly improved, which leads to a decrease in concurrency and a decrease in efficiency.

Insert picture description here

in conclusion

Through the above analysis, we are very sure that this should be a regression of MySQL 5.7. We submitted a bug to the open source community. Oracle confirms that it is a problem, and this bug needs to be further analyzed and investigated.

Pay attention, don't get lost

Alright, everyone, the above is the entire content of this article. The people who can see here are all talents . As I said before, there are many technical points in PHP, because there are too many, it is really impossible to write, and you will not read too much after writing it, so I will organize it into PDF and documents here, if necessary Can

Click to enter the secret code: PHP+「Platform」

Insert picture description here

Insert picture description here


For more learning content, you can visit the [Comparative Standard Factory] Catalogue of Excellent PHP Architect Tutorials, as long as you can read it to ensure that the salary will rise a step (continuous update)

The above content hopes to help you . Many PHPers always encounter some problems and bottlenecks when they are advanced. There is no sense of direction when writing too much business code. I don’t know where to start to improve. I have compiled some information about this, including But not limited to: distributed architecture, high scalability, high performance, high concurrency, server performance tuning, TP6, laravel, YII2, Redis, Swoole, Swoft, Kafka, Mysql optimization, shell scripts, Docker, microservices, Nginx, etc. Multiple knowledge points, advanced advanced dry goods, can be shared with everyone for free, and those needed can join my PHP technology exchange group

Guess you like

Origin blog.csdn.net/weixin_49163826/article/details/108717529