Solving online database deadlock is as simple as that

A few days ago, a database deadlock problem occurred online. This problem has been investigated for a long time. During this process, I have a deeper understanding of the database lock mechanism.

image.png

This article summarizes the entire process of this deadlock investigation, and analyzes the causes and solutions of the deadlock. Hope to provide you with a deadlock investigation and solution ideas.

This article involves the MySQL execution engine, database isolation level, InnoDB lock mechanism, index, database transaction and other fields of knowledge. I hope that readers can learn something from the past and the future.

phenomenon

One evening, a colleague was making an announcement, and suddenly there were a large number of online alarms, many of which were about database deadlocks. The alarm information is as follows:

{"errorCode":"SYSTEM_ERROR","errorMsg":"nested exception is org.apache.ibatis.exceptions.PersistenceException:
Error updating database. Cause: ERR-CODE: [TDDL-4614][ERR_EXECUTE_ON_MYSQL]
Deadlock found when trying to get lock;
The error occurred while setting parameters\n### SQL:
update fund_transfer_stream set gmt_modified=now(),state = ? where fund_transfer_order_no = ? and seller_id = ? and state = 'NEW'

Through the alarm, we can basically locate the database and database table where the deadlock occurs. Let me first introduce the database related information involved in the case of this article.

Background situation

The database we use is MySQL 5.7, the engine is InnoDB, and the transaction isolation level is READ-COMMITED.

Database version query method:

select version();

Engine query method:

show create table fund_transfer_stream;

The storage engine information is displayed in the table creation statement, such as: ENGINE=InnoDB.

Transaction isolation level query method:

select @@tx_isolation;

Transaction isolation level setting method (only valid for the current Session):

set session transaction isolation level read committed;

PS: Note that if the database is sub-database, the above several SQL statements need to be executed on a single database, not in the logic database.

Table structure and index situation where deadlock occurs (some irrelevant fields and indexes are hidden):

CREATE TABLE fund_transfer_stream(
idbigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT'primary key',
gmt_createdatetime NOT NULL COMMENT'creation time',
gmt_modifieddatetime NOT NULL COMMENT'modification time',
pay_scene_namevarchar(256) NOT NULL COMMENT'payment scene name',
pay_scene_versionvarchar(256 ) DEFAULT NULL COMMENT'Payment scenario version',
identifiervarchar(256) NOT NULL COMMENT'Unique ID',
seller_idvarchar(64) NOT NULL COMMENT'Seller Id',
statevarchar(64) DEFAULT NULL COMMENT'Status', fund_transfer_order_novarchar(256)
DEFAULT NULL COMMENT'Status returned by the capital platform',
PRIMARY KEY ( id),UNIQUE KEY uk_scene_identifier
(KEY idx_seller( seller_id),
KEY idx_seller_transNo( seller_id, fund_transfer_order_no(20))
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Funds flow';

The database has three indexes, 1 clustered index (primary key index), and 2 non-clustered indexes (non-primary key index).

Clustered index:

PRIMARY KEY (id)

Non-clustered index:

KEY idx_seller (seller_id),

KEY idx_seller_transNo (seller_id,fund_transfer_order_no(20))

In the above two indexes, idx_seller_transNo has actually covered idx_seller. Due to historical reasons, this table is divided by seller_id, so idx_seller first, idx_seller_transNo later.

Deadlock log

When a deadlock occurs in the database, the deadlock log can be obtained through the following command:

show engine innodb status

When a deadlock occurs, check the deadlock log for the first time, and the contents of the deadlock log are as follows:

Transactions deadlock detected, dumping detailed information.
2019-03-19T21:44:23.516263+08:00 5877341 [Note] InnoDB:

*** (1) TRANSACTION:
TRANSACTION 173268495, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 304 lock struct(s), heap size 41168, 6 row lock(s), undo log entries 1
MySQL thread id 5877358, OS thread handle 47356539049728, query id 557970181 11.183.244.150 fin_instant_app updating

update fund_transfer_stream set gmt_modified = NOW(), state = 'PROCESSING' where ((state = 'NEW') AND (seller_id = '38921111') AND (fund_transfer_order_no = '99010015000805619031958363857'))
2019-03-19T21:44:23.516321+08:00 5877341 [Note] InnoDB:

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 173 page no 13726 n bits 248 index idx_seller_transNo of table xxx.fund_transfer_stream trx id 173268495 lock_mode X locks rec but not gap
Record lock, heap no 168 PHYSICAL RECORD: n_fields 3; compact format; info bits 0

2019-03-19T21:44:23.516565+08:00 5877341 [Note] InnoDB:

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 173 page no 12416 n bits 128 index PRIMARY of table xxx.fund_transfer_stream trx id 173268495 lock_mode X locks rec but not gap waiting
Record lock, heap no 56 PHYSICAL RECORD: n_fields 17; compact format; info bits 0
2019-03-19T21:44:23.517793+08:00 5877341 [Note] InnoDB:

*** (2) TRANSACTION:
TRANSACTION 173268500, ACTIVE 0 sec fetching rows, thread declared inside InnoDB 81
mysql tables in use 1, locked 1
302 lock struct(s), heap size 41168, 2 row lock(s), undo log entries 1
MySQL thread id 5877341, OS thread handle 47362313119488, query id 557970189 11.131.81.107 fin_instant_app updating

update fund_transfer_stream_0056 set gmt_modified = NOW(), state = 'PROCESSING' where ((state = 'NEW') AND (seller_id = '38921111') AND (fund_transfer_order_no = '99010015000805619031957477256'))
2019-03-19T21:44:23.517855+08:00 5877341 [Note] InnoDB:

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 173 page no 12416 n bits 128 index PRIMARY of table fin_instant_0003.fund_transfer_stream_0056 trx id 173268500 lock_mode X locks rec but not gap
Record lock, heap no 56 PHYSICAL RECORD: n_fields 17; compact format; info bits 0

2019-03-19T21:44:23.519053+08:00 5877341 [Note] InnoDB:

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 173 page no 13726 n bits 248 index idx_seller_transNo of table fin_instant_0003.fund_transfer_stream_0056 trx id 173268500 lock_mode X locks rec but not gap waiting
Record lock, heap no 168 PHYSICAL RECORD: n_fields 3; compact format; info bits 0

2019-03-19T21:44:23.519297+08:00 5877341 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (2)

A simple interpretation of the deadlock log, you can get the following information:

①The two SQL statements that caused the deadlock are:

update fund_transfer_stream_0056
set gmt_modified = NOW(), state = 'PROCESSING'
where ((state = 'NEW') AND (seller_id = '38921111') AND (fund_transfer_order_no = '99010015000805619031957477256'))

update fund_transfer_stream_0056
set gmt_modified = NOW(), state = 'PROCESSING'
where ((state = 'NEW') AND (seller_id = '38921111') AND (fund_transfer_order_no = '99010015000805619031958363857'))

② Transaction 1, holding the lock of index idx_seller_transNo, waiting to acquire the lock of PRIMARY.

③ Transaction 2, holding the PRIMARY lock, is waiting to acquire the idx_seller_transNo lock.

④ Deadlock occurs due to circular waiting between transaction 1 and transaction 2.

⑤The locks currently held by transaction 1 and transaction 2 are: lock_mode X locks rec but not gap.

Both transactions add X locks to records, and No Gap locks, that is, locks on the current row record (Record Lock), without adding gap locks.

X lock: Exclusive lock, also known as write lock. If transaction T adds X lock to data object A, transaction T can read A or modify A, and other transactions cannot add any locks to A until T releases the lock on A. This ensures that other transactions can no longer read and modify A before T releases the lock on A.

Corresponding to it is S lock: shared lock, also known as read lock. If transaction T adds S lock to data object A, transaction T can read A but cannot modify A. Other transactions can only add S lock to A. It cannot add X lock until T releases the S lock on A.

This ensures that other transactions can read A, but cannot make any changes to A until T releases the S lock on A.

Gap Lock: Gap lock, lock a range, but does not include the record itself. The purpose of the gap lock is to prevent the two current reads of the same transaction from phantom reads.

Next-Key Lock: 1+2, lock a range, and lock the record itself. For row queries, this method is used, and the main purpose is to solve the problem of phantom reading.

Troubleshooting

Based on the database-related information we currently know and the deadlock log, we can basically make some simple judgments.

First of all, this deadlock must have nothing to do with Gap lock and Next-Key Lock. Because our database isolation level is RC (READ-COMMITED), this isolation level does not add Gap locks. The previous deadlock log also mentions this.

Then, we have to look through the code and see how transactions are done in our code. The core code and SQL are as follows:

@Transactional(rollbackFor = Exception.class)
public int doProcessing(String sellerId, Long id, String fundTransferOrderNo) {
fundTreansferStreamDAO.updateFundStreamId(sellerId, id, fundTransferOrderNo);
return fundTreansferStreamDAO.updateStatus(sellerId, fundTransferOrderNo,"PROCESSING");
}

The purpose of this code is to successively modify two different fields of the same record, updateFundStreamId SQL:

update fund_transfer_stream
set gmt_modified=now(),fund_transfer_order_no = #{fundTransferOrderNo}
where id = #{id} and seller_id = #{sellerId}

updateStatus SQL:

update fund_transfer_stream
set gmt_modified=now(),state = #{state}
where fund_transfer_order_no = #{fundTransferOrderNo} and seller_id = #{sellerId}
and state = 'NEW'

As you can see, we executed two Update statements in the same transaction. Here are the execution plans of the next two SQLs:

The PRIMARY index is used when updateFundStreamId is executed.

The idx_seller_transNo index is used when updateStatus is executed.

Through the execution plan, we found that updateStatus actually has two indexes available, and the idx_seller_transNo index is actually used during execution. This is because the MySQL query optimizer is a cost-based query method.

Therefore, in the query process, the most important part is to calculate the cost of the query based on the SQL statement of the query and multiple indexes, so as to select the optimal index method to generate the query plan.

Our query execution plan is done after the deadlock occurs. The execution plan of the post-mortem query is not necessarily the same as the index usage at the moment of the deadlock.

However, combined with the deadlock log, we can also locate the index used when the above two SQL statements are executed.

That is, the PRIMARY index is used when updateFundStreamId is executed, and the idx_seller_transNo index is used when updateStatus is executed.

With the above known information, we can begin to investigate the cause of the deadlock and the principle behind it.

By analyzing the deadlock log, combined with our code and database table building statement, we found that the main problem lies in our idx_seller_transNo index:

KEY idx_seller_transNo (seller_id,fund_transfer_order_no(20))

In the index creation statement, we used the prefix index. In order to save index space and improve index efficiency, we only selected the first 20 bits of the fund_transfer_order_no field as the index value.

Because fund_transfer_order_no is just an ordinary index, not a unique index. And because in a special case, the first 20 bits of two fund_transfer_order_no of the same user will be the same.

This leads to the same index value for two different records (because seller_id and fund_transfer_order_no(20) are the same).

Just like the example in this article, the two values ​​of the fund_transfer_order_no field of the two records where the deadlock occurs are the same in the first 20 bits:

99010015000805619031958363857

99010015000805619031957477256

image.png

So why does the same first 20 bits of fund_transfer_order_no cause a deadlock?

Locking principle

Let's take this case to see what is the principle of MySQL database locking, and what happened behind the deadlock in this article.

We simulate a deadlock scenario on the database, the execution sequence is as follows:

image.png

We know that in MySQL, row-level locks do not directly lock records, but lock indexes. Indexes are divided into primary key indexes and non-primary key indexes:

If a SQL statement manipulates the primary key index, MySQL will lock the primary key index.

If a statement manipulates a non-primary key index, MySQL will first lock the non-primary key index, and then lock the related primary key index.

The leaf node of the primary key index stores the entire row of data. In InnoDB, the primary key index is also called a clustered index (Clustered Index).

The content of the leaf node of a non-primary key index is the value of the primary key. In InnoDB, a non-primary key index is also called a non-clustered index (Secondary Index).

Therefore, the index structure involved in the example in this article (the index is a B+ tree, simplified into a table) is shown in the figure:

image.png

The occurrence of a deadlock does not depend on the number of SQL statements in the transaction. The key to the deadlock lies in the inconsistent sequence of two (or more) Session locks.

Then let's take a look at the lock order of the two transactions in the above example:

image.png

The following figure is an exploded view, the lock situation when each SQL is executed:

image.png

Combining the above two pictures, we found the cause of the deadlock:

Transaction 1 executes update1 to occupy PRIMARY = 1 lock; transaction 2 executes update1 to occupy PRIMARY = 2 lock;.

Transaction 1 executes update2 and holds the lock of idx_seller_transNo = (3111095611, 99010015000805619031). The attempt to hold PRIMARY = 2 fails (blocking).

Transaction 2 failed to execute update2 (deadlock) when attempting to occupy the lock of idx_seller_transNo = (3111095611, 99010015000805619031).

When the transaction updates the non-primary key index as the Where condition, it will first lock the non-primary key index, and then query the primary key indexes corresponding to the non-primary key index, and then lock these primary key indexes. )

Solution

So far, we have clearly analyzed the fundamental principles that lead to deadlocks and the principles behind them. Then this problem is not difficult to solve.

You can start from two aspects, namely:

Modify index

Modify the code (including SQL statements)

Modify the index: as long as we modify the prefix length of fund_transfer_order_no in the prefix index idx_seller_transNo.

For example, change to 50 to avoid deadlock. However, after changing the prefix length of idx_seller_transNo, the precondition for solving the deadlock is that the fund_transfer_order_no index will be used when the Update statement is actually executed.

If the MySQL query optimizer decides to use the index KEY idx_seller(seller_id) after cost analysis, there will still be a deadlock problem. The principle is similar to this article.

Therefore, the fundamental solution is to change the code:

All updates are performed through the primary key ID.

In the same transaction, avoid multiple Update statements to modify the same record.

Summary and reflection

In the week after the deadlock occurred, I took time to study for a while almost every day. The problem was identified early and the modification plan was also available, but the principle has not been figured out.

I have made a lot of inferences and assumptions before and after, and they have been overturned time and time again. In the end, you must rely on practice to verify your ideas.

So I installed the database locally, did some actual tests, and checked the database lock status in real time. show engine innodb status; You can view the lock status. Finally figured out the principle.

A few thoughts briefly:

Don’t guess if you have a problem! ! ! Reproduce the problem by yourself, and then analyze it.

Don't ignore the context! ! ! At the beginning, I only paid attention to the deadlock log. I always ignored the transaction in the code and actually executed another SQL statement (updateFundStreamId).

No matter how sufficient theoretical knowledge is, you may not remember it at the critical moment! ! !

The pits are buried by themselves! ! !

Guess you like

Origin blog.51cto.com/14073073/2550987