MySQL transaction deadlock problem troubleshooting | JD Cloud technical team

1. Background

In the pre-release environment, the message driver eventually triggers the execution of a transaction to write the inventory, but this causes a deadlock in MySQL and fails to write the inventory.

com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: rpc error: code = Aborted desc = Deadlock found when trying to get lock; try restarting transaction (errno 1213) (sqlstate 40001) (CallerID: ): Sql: "/* uag::omni_stock_rw;xx.xx.xx.xx:xxxxx;xx.xx.xx.xx:xxxxx;xx.xx.xx.xx:xxxxx;enable */  insert into stock_info(tenant_id, sku_id, store_id, available_num, actual_good_num, order_num, created, modified, SAVE_VERSION, stock_id) values (:vtg1, :vtg2, :_store_id0, :vtg4, :vtg5, :vtg6, now(), now(), :vtg7, :__seq0) /* vtgate:: keyspace_id:e267ed155be60efe */", BindVars: {__seq0: "type:INT64 value:"29332459" "_store_id0: "type:INT64 value:"50650235" "vtg1: "type:INT64 value:"71" "vtg2: "type:INT64 value:"113817631" "vtg3: "type:INT64 value:"50650235" "vtg4: "type:FLOAT64 value:"1000.000" "vtg5: "type:FLOAT64 value:"1000.000" "vtg6: "type:INT64 value:"0" "vtg7: "type:INT64 value:"20937611645" "}

Preliminary investigation revealed that there were two requests for writing inventory at the same time.

The time difference is 1 second, but the final execution result is that the two transactions are deadlocked with each other and both failed.

The transaction definition is very simple, and the pseudocode is described as follows:

start transaction
// 1、查询数据
data = select for update(tenantId, storeId, skuId);
if (data == null) {
    // 插入数据
    insert(tenantId, storeId, skuId);
} else {
    // 更新数据
    update(tenantId, storeId, skuId);
}
end transaction

The index structure of the database table is as follows:

Index type index composition column
PRIMARY KEY (stock_id)
UNIQUE KEY (sku_id,store_id)

The database engine used is Innodb, and the isolation level is RR [Repeatable Read].

2. Analytical ideas

First, let’s understand the content about locks in the Innodb engine.

2.1 Locks in Innodb

2.1.1 Row-level locks

In the Innodb engine, there are three ways to implement row-level locks:

name describe
Record Lock Locking single row records is supported under isolation levels RC and RR.
Gap Lock Gap lock locks index record gaps (excluding queried records). The lock range is left-open and right-open. It is only supported under the RR isolation level.
Next-Key Lock Temporary key lock locks the row where the query record is located, and locks the previous range at the same time. Therefore, the range is open on the left and closed on the right. It is only supported under the RR isolation level.

At the same time, standard row locks are implemented in Innodb, which can be divided into two categories according to the lock type:

name symbol describe
shared lock S An exclusive lock that allows a transaction to read a row of data and prevents other transactions from obtaining the same data set.
exclusive lock X Allows a transaction to delete or update a row of data, preventing other transactions from obtaining shared and exclusive locks on the same data set.

In short, when something acquires a shared lock, other things can only acquire the shared lock. If you want to acquire an exclusive lock, you must wait for the shared lock to be released; if something acquires an exclusive lock, other things will not acquire the shared lock. Whether it is a lock or an exclusive lock, you need to wait for the exclusive lock to be released. As shown in the following table:

Lock to be acquired (bottom)\Lock acquired (right) Shared lock S Exclusive lockX
Shared lock S compatible Not compatible
Exclusive lockX Not compatible Not compatible

2.1.2 Example of locking under RR isolation level

If there is such a table user now, the locking situation will be analyzed one by one for different query requests. The user table is defined as follows:

CREATE TABLE `user` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键ID',
  `user_id` bigint(20) DEFAULT NULL COMMENT '用户id',
  `mobile_num` bigint(20) NOT NULL COMMENT '手机号',
  PRIMARY KEY (`id`),
  UNIQUE KEY `IDX_USER_ID` (`user_id`),
  KEY `IDX_MOBILE_NUM` (`mobile_num`)  
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='用户信息表'

The primary key id and user_id are unique indexes, and user_name is a common index.

Assume that the existing data in the table looks like this:

id user_id mobile_num
1 1 3
5 5 6
8 8 7
9 9 9

The following will use the select...for update statement to query, taking examples for unique indexes and ordinary indexes respectively.

1. Unique index equivalent query
select * from user
where id = 5 for update
select * from user
where user_id = 5 for update

In these two SQLs, how does Innodb lock when executing the query process?

We all know that Innodb's default index data structure is a B+ tree. The leaf nodes of the B+ tree contain pointers to the next leaf node. During the query process, the search will be carried out according to the B+ tree search method. The underlying principle is similar to binary search. Therefore, during the locking process, locking will be carried out according to the following two principles:

1. Only the intervals near the query target will be locked, not all intervals in the search path. In this example, when searching for id=5 or user_id=5, you can finally locate the area (1,5] that meets the search conditions.

2. When locking, Next key Lock will be used as the locking unit. Then add Next key Lock according to the area that satisfies 1 (left open and right closed). At the same time, because id=5 or user_id=5 exists, the Next key Lock will degenerate into Record Lock, so only id=5 or user_id= 5This index row is locked.

If the queried id does not exist, for example:

select * from user
where id = 6 for update

According to the above two principles, first lock the area near the query target condition, so the interval that will eventually be found is (5, 8]. Because the record with id=6 does not exist, Next key Lock (5, 8] It will eventually degenerate into Gap Lock, that is, add gap lock to index (5,8).

2. Unique index range query
select * from user
where id >= 4 and id <8 for update

In the same way, in the range query, the left value id=4 will be matched first. At this time, the Next key Lock will be added to the interval (1,5]. Because id=4 does not exist, the lock degenerates into Gap Lock(1,5) ;Then it will continue to search for records with id=8 until it finds the first unsatisfied interval, that is, Next key Lock(8, 9]. Because 8 is not within the range, the lock degenerates into Gap Lock(8, 9) .So the area that this range query will eventually lock is (1, 9)

3. Non-unique index equivalent query

When querying non-unique indexes, the locking method is slightly different from the above. In addition to adding Next key Lock to the range containing the query value, Gap Lock must also be added to the next range that does not meet the query conditions, that is, two locks need to be added.

select * from user
where mobile_num = 6 for update

It is necessary to add Next key Lock to the index (3, 6], because it is a non-unique index at this time, so there may be multiple 6s, so it will not degenerate into a Record Lock at this time; in addition, it is necessary to add a Next key Lock to the index (3, 6] that does not meet the query conditions. Add Gap Lock to the next interval, that is, lock index (6,7). Therefore, overall, (3,6]Next key Lock and (6, 7) Gap Lock are added to the index.

If the non-unique index does not hit, as follows:

select * from user 
where mobile_num = 8 for update

Then you need to add Next key Lock to index (7, 9], and because 8 does not exist, the lock degenerates into Gap Lock (7, 9)

4. Non-unique index range query
select * from user
where mobile_num >= 6 and mobile_num < 8
for update 

First, match mobile_num=6. At this time, Next Key Lock will be added to the index (3, 6]. Although the non-unique index exists at this time, it will not degenerate into Record Lock; secondly, look at the second half of the query mobile_num=8, Next key Lock needs to be added to the index (7, 9], and because 8 does not exist, it degenerates into Gap Lock (7, 9). Finally, Next key Lock (3, 6] and Gap Lock (7) need to be added to the index row , 9).

2.1.3 Intention Locks

In order to support multi-granularity locking, Innodb introduces intention locks. Intention lock is a table-level lock, used to indicate that a transaction will lock a certain row of data in a certain table. Similarly, intent locks are also divided into categories: shared intent locks (IS) and exclusive intent locks (IX).

name symbol describe
shared intent lock IS Indicates that the transaction will set shared locks on individual rows of the table
exclusive intent lock IX Indicates that the transaction will set exclusive locks on individual rows of the table

For example, select … lock in shared mode will set the shared intention lock IS; select … for update will set the exclusive intention lock IX

When setting the intention lock, you need to follow the following two principles:

1. When a transaction needs to apply for a row 's shared lock S , it must first apply for a shared intention IS lock or a stronger lock on the table .

2. When a transaction needs to apply for an exclusive lock X on a row , it must first apply for an exclusive intention IX lock on the table .

The table-level lock compatibility matrix is ​​as follows:

Lock to be acquired (bottom)/Lock acquired (right) X IX S IS
X conflict conflict conflict conflict
IX conflict compatible conflict compatible
S conflict conflict compatible compatible
IS conflict compatible compatible compatible

If the transaction requesting the lock is compatible with the existing lock, the lock will be granted to the transaction, but if it conflicts with the existing lock, it will not be granted. The transaction waits until the conflicting existing lock is released.

The purpose of the intention lock is to indicate that the transaction is locking a row of the table, or is about to lock a row of the table. In the concept of intention locks, except that locking the entire table will cause the intention lock to block, the intention lock will not block any requests in other cases!

2.1.4 Inserting intention lock

The insertion intention lock is a special intention lock, and it is also a special "Gap Lock". It is a Gap Lock set before the Insert operation.

If there are multiple transactions performing insert operations at this time, and the positions to be inserted are all in the same Gap Lock, but not at the same position of the Gap Lock, the insertion intention locks at this time will not block each other.

2.2 Process Analysis

Returning to the problem of this article, there are two transactions in this article that perform the same action. They first execute select...for update to obtain an exclusive lock. Secondly, if it is judged to be empty, the insert action is executed, otherwise the update action is executed. The pseudocode is described as follows:

start transaction
// 1、查询数据
data = select for update(tenantId, storeId, skuId);
if (data == null) {
    // 插入数据
    insert(tenantId, storeId, skuId);
} else {
    // 更新数据
    update(tenantId, storeId, skuId);
}
end transaction

Now analyze the actions performed by these two transactions one by one, as shown in the following table:

time point Transaction A Transaction B potential action
1 start transaction start transaction
2 Execute select ... for update operation Transaction A applies to IX Transaction A applies to X, Gap Lock
3 Execute select ... for update operation Transaction B applies for IX, which does not conflict with transaction A's IX. Transaction B applies for Gap Lock, and Gap Lock can coexist.
4 Perform insert operation Transaction A first applies to insert the intention lock IX, which conflicts with the Gap Lock of transaction B, and waits for the Gap Lock of transaction B to be released.
5 Perform insert operation Transaction B first applies to insert the intention lock IX, which conflicts with the Gap Lock of transaction A, and waits for the Gap Lock of transaction A to be released.
6 Deadlock detector detected deadlock

Detailed analysis:

•Time point 1, transaction A and transaction B start executing transactions

•At time point 2, transaction A performs the select...for update operation. When performing this operation, it first needs to apply for the intention exclusive lock IX to act on the table, and then apply for the exclusive lock X to act on the interval. Because the queried value does not exist, the Next key Lock degenerates into Gap Lock.

•At time point 3, transaction B performs the select ... for update operation and first applies for the intention exclusive lock IX. According to the table-level lock compatibility matrix in Section 2.1.3 , we can see that the intention locks are compatible with each other, so the application for IX is successful. Since the query value does not exist, you can apply for the Gap Lock of X, and Gap Locks can coexist, whether they are shared or exclusive. For this point, please refer to Innodb’s description of Gap Lock . The key description is pasted here:

Gap locks can co-exist. A gap lock taken by one transaction does not prevent another transaction from taking a gap lock on the same gap. There is no difference between shared and exclusive gap locks. They do not conflict with each other, and they perform the same function.

•At time point 4, before transaction A performs the insert operation, it will first apply for the insertion intention lock, but at this time transaction B already has the exclusive lock on the insertion interval. According to the table-level lock compatibility matrix in Section 2.1.3, when there is already X lock In this case, it is conflicting to apply for the IX lock again, and you need to wait for transaction B to release the X Gap Lock.

•At time point 5, before transaction B performs the insert operation, it will first apply for the insertion intention lock. At this time, transaction A also owns the X Gap Lock for the insertion interval, so it needs to wait for transaction A to release the X lock.

•At time 6, both transaction A and transaction B were waiting for the other party to release the X lock. After being detected by MySQL's deadlock detector, a Dead Lock error was reported.

Thinking: If the data queried by select... for update exists, what will the process be like? The process is as follows:

time point Transaction A Transaction B potential action
1 start transaction start transaction
2 Execute select ... for update operation 事务A申请到IX 事务A申请到X行锁,因数据存在故锁退化为Record Lock。
3 执行select … for update操作 事务B申请到IX,与事务A的IX不冲突。 事务B想申请目标行的Record Lock,此时需要等待事务A释放该锁资源。
4 执行update操作 事务A先申请插入意向锁IX,此时事务B仅仅拥有IX锁资源,兼容,不冲突。然后事务A拥有X的Record Lock,故执行更新。
5 commit 事务A提交,释放IX与X锁资源。
6 执行select … for update操作 事务B事务B此时获取到X Record Lock。
7 执行update操作 事务B拥有X Record Lock执行更新
8 commit 事务B释放IX与X锁资源

也就是当查询数据存在时,不会出现死锁问题。

三、解决方法

1、在事务开始之前,采用CAS+分布式锁来控制并发写请求。分布式锁key可以设置为store_skuId_version

2、事务过程可以改写为:

start transaction
// RR级别下,读视图
data = select from table(tenantId, storeId, skuId)
if (data == null) {
    // 可能出现写并发
    insert
} else {
    data = select for update(tenantId, storeId, skuId)
    update
}
end transaction

虽然解决了插入数据不存在时会出现的死锁问题,但是可能存在并发写的问题,第一个事务获得锁会首先插入成功,第二个事务等待第一个事务提交后,插入数据,因为数据存在了所以报错回滚。

3、调整事务隔离级别为RC,在RC下没有next key lock(注意,此处并不准确,RC会有少部分情况加Next key lock),故此时仅仅会有record lock,所以事务2进行select for update时需要等待事务1提交。

参考文献

[1] Innodb锁官方文档:https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html

[2] https://blog.csdn.net/qq_43684538/article/details/131450395

[3] https://www.jianshu.com/p/027afd6345d5

[4] https://www.cnblogs.com/micrari/p/8029710.html

若有错误,还望批评指正

作者:京东零售  刘哲

来源:京东云开发者社区 转载请注明来源

Guess you like

Origin blog.csdn.net/jdcdev_/article/details/133302538