Online SQL timeout scenario analysis-MySQL timeout gap lock | JD Logistics Technical Team

Preface

I have encountered a scenario where online SQL execution timeout was caused by MySQL gap lock before. Let’s record it.

Background information

Distributed transaction message table : Using the message table in business, relying on local transactions, a set of distributed transaction solutions are implemented

Message table name : mq_messages

Data volume : more than 30 million

Index : create_time and status

status : There are two values, 1 and 2. More than 99% of the statuses are 2, which means that all distributed transactions have been executed and can be deleted.

Message table processing logic :

1. Start an independent scheduled task and delete the historical data with status=2. The specific SQL is as follows:

    delete from mq_messages where create_time<xxx and status=2 limit 200

2. Scheduled task execution frequency: a task is run once every 3 minutes, and a task is deleted 200 times. This condition basically filters out more than 90% of the data

Business logic : When the online business is executed, data with status=1 is continuously inserted into the table, and the primary key id increases with time.

Scenarios caused by sql timeout

When the traffic peak of a large-scale promotion event occurred, the database connection was full. The initial position was that the amount of data was too large and caused table locking . In order to prevent the database connection from being filled up again, the data with status 2 needs to be deleted as soon as possible, and the scheduled task must be manually executed to delete the data. The specific SQL is:

delete from mq_messages where status=2 limit 2000

A task is executed once every three minutes, and a task is deleted 200 times.

Then, the database connection was immediately filled up and the database hung up .

Review analysis

Are there table locks online?

Initialize table structure (simplified table structure)

CREATE TABLE `my_test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `a` int(11) NOT NULL,
  `b` int(11) NOT NULL,
  `state` int(11) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`),
KEY `a` (`a`),
KEY `state` (`state`) USING BTREE
) 
ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;

Stored procedure prepares test data

DELIMITER $$
  CREATE PROCEDURE pro_copy_date()
  BEGIN
  SET @i=1;
  WHILE @i<=100000 DO
  INSERT INTO my_test VALUES(@i,@i,@i,1); 
  SET @i=@i+1;
  END WHILE;
END $$
call pro_copy_date();
UPDATE my_test SET state =2 WHERE id <= 99990;

verify

1. Basic data situation

There are a total of 100,000 pieces of data in the table, only the last 10 pieces have state=1 (id>99990)

2. Transaction isolation level repeatable read

3. Start a transaction A and do not commit it

implementDELETE FROM my_test WHERE state =2 LIMIT 2000;

4. Start another transaction B

• Update the data with id=2001 and the update can be successful.

• Updating the data with id=2000 is blocked

Indicates no table lock

5. Start another transaction C

• Inserting data with status 2 can be inserted successfully

• Inserting data with status 0 can be inserted successfully

• Inserting data with status 1 is blocked

Indicates that the gap between state 1 and 2 is locked and cannot be inserted.

in conclusion

There is no table lock on the line, but a gap lock .

gap lock

Online gap lock scenario analysis

There are two values ​​of state in the table, 1 and 2. This results in three gaps (-∞, 1), (1, 2), (2, +∞) and two solitary values ​​1 and 2. According to the principle of front opening and back closing , the corresponding temporary lock intervals are (-∞, 1], (1, 2], (2, +∞)

During execution DELETE FROM my_test WHERE state =2 LIMIT 2000, the number of rows scanned is (state=2, id=1) to (state=2, id=2000). state=2 falls in the interval](1,2]. Therefore, the locked range is (state=1,id=100000) to (state=2,id=2000), as shown in the figure:

For online scenarios, the range of the lock is (state=1, id=the maximum id with a status of 1) to (state=2, id=the maximum id value in the record to be deleted). Since only state=1 will be inserted online and the id is increasing. The newly inserted id is the maximum value of the table, so the newly inserted record will definitely fall in the lock interval, so the newly inserted record will be blocked.

gap locking effect

Solve phantom reading

Phantom read refers to when a transaction queries the same range twice, and the latter query sees data rows that were not seen in the previous query.

Phantom reads specifically refer to newly inserted data.

Under the repeatable read isolation level, ordinary queries are snapshot reads and will not see data inserted by other transactions. Phantom reading will only appear under "Current Reading". Innodb's method to solve phantom reading is gap lock.

Problems caused by phantom reading

Create a new test table:

CREATE TABLE `my_test2` (
  `id` INT (11) NOT NULL,
  `b` INT (11) DEFAULT NULL,
  `c` INT (11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `c` (`c`)
) ENGINE = INNODB;

-- 插入测试数据
NSERT INTO my_test2  VALUES(0, 0, 0),(5, 5, 5),(10, 10, 10),(15, 15, 15);

test sql 1

begin;
select * from t where b=5 for update;

This statement will hit the row with b=5 and the corresponding primary key id=5. Therefore, after the select statement is executed, a write lock will be added to the row with id=5. This write lock will be released when the commit statement is executed.

Since there is no index on field b, this query statement will perform a full table scan. So, will other scanned records that do not meet the conditions be locked?

If only the record with ID 5 is locked:

 Transaction A Transaction B Transaction C
T1 BEGIN; SELECT * FROM my_test2 where b=5 FOR UPDATE; 结果(5,5,5)  
T2  UPDATE my_test2 SET b=5 WHERE id = 0; 
T3 SELECT * FROM my_test2 where b=5 FOR UPDATE; 结果(0,5,0)(5,5,5)  
T4   INSERT INTO my_test2(1,5,1)
T5 SELECT * FROM my_test2 where b=5 FOR UPDATE; 结果(0,5,0)(1,5,1)(5,5,5)  
T6 commit  



 Transaction A Transaction B
T1 BEGIN; SELECT * FROM my_test2 where b=5 FOR UPDATE; 
T2  UPDATE my_test2 SET b=5 WHERE id = 0; UPDATE my_test2 SET c=5 WHERE id = 0;
T3 commit 

If only the record with ID 5 is locked, it will destroy the locking statement of transaction A, that is, " lock all rows with b=5 and prevent other transactions from reading and writing. "

 Transaction A Transaction B Transaction C
T1 BEGIN; SELECT * FROM my_test2 WHERE b=5 FOR UPDATE; UPDATE my_test2 SET c=10 WHERE b=5;  
T2  UPDATE my_test2 SET b=5 WHERE id = 0; 
T3   INSERT INTO my_test2(1,5,1)
T4 commit  

T1 time: In this row of data with id=5, the value of c is changed to 10, the transaction has not been submitted yet, and the binlog has not been written yet.

At T2: the line id=0 becomes (0,5,0), and the changes are written to the binlog;

At T3: the line id=1 becomes (1,5,1), and the changes are written to the binlog;

Time T4: Transaction A is submitted and written to binlog.

At this time, the data in the main database is (0,5,0), (1,5,1), (5,5,10)

Therefore, the log written by binlog is:

UPDATE my_test2 SET b=5 WHERE id = 0;
INSERT INTO my_test2(1,5,1)
UPDATE my_test2 SET c=10 WHERE b=5;

After the binglog is executed from the library, the data becomes (0,5,10), (1,5,10), (5,5,10), so there is data inconsistency.

The reason for the data inconsistency is that only the rows that need to be changed at that moment are locked, and it cannot prevent the existing data from becoming b=5 .

What will happen if all scanned rows are locked? Since b has no index, the index has to scan the entire table to know which row needs to be updated, so every record in the table will be locked.

 Transaction A Transaction B Transaction C
T1 BEGIN; SELECT * FROM my_test2 where b=5 FOR UPDATE; UPDATE my_test2 SET c=10 WHERE b=5;  
T2  UPDATE my_test2 SET b=5 WHERE id = 0; (block) 
T3   INSERT INTO my_test2(1,5,1)
T4 commit  

T1 time: In this row of data with id=5, the value of c is changed to 10, the transaction has not been submitted yet, and the binlog has not been written yet.

T2 time: The row with id 0 is locked and cannot be updated, waiting for the lock to be released;

At T3: the line id=1 becomes (1,5,1), and the changes are written to the binlog;

Time T4: Transaction A is submitted and written to binlog.

Time T5: Transaction A has been submitted, the lock with id=0 is released, transaction B is updated successfully, becomes (0,5,0), and is written to the binlog

At this time, the data in the main database is (0,5,0), (1,5,1), (5,5,10)

Therefore, the log written by binlog is:

INSERT INTO my_test2(1,5,1)
UPDATE my_test2 SET c=10 WHERE b=5;
UPDATE my_test2 SET b=5 WHERE id = 0;

After the binglog is executed from the library, the data becomes (0,5,0), (1,5,10), (5,5,10), so there is still data inconsistency.

The rows scanned during the search process are locked, effectively avoiding data inconsistencies caused by modifications. The data inserted in the gap between the data will still appear with b=5 data , so to solve this problem we need lock the gap in the data .

 Transaction A Transaction B Transaction C
T1 BEGIN; SELECT * FROM my_test2 b=5 FOR UPDATE; UPDATE my_test2 SET c=10 WHERE b=5;  
T2  UPDATE my_test2 SET b=5 WHERE id = 0; (block) 
T3   INSERT INTO my_test2(1,5,1) (block)
T4 commit  

T1 time: In this row of data with id=5, the value of c is changed to 10, the transaction has not been submitted yet, and the binlog has not been written yet.

Time T2: The row with id 0 is locked and cannot be updated waiting for the lock to be released;

Time T3: The gap (0,5) is locked and cannot be inserted waiting for the lock to be released;

Time T4: Transaction A is submitted and written to binlog.

Time T5: Transaction A has been submitted, the lock with id=0 is released, transaction B is updated successfully, becomes (0,5,0), and is written to the binlog

Time T6: Transaction A has been submitted, the gap lock of (0,5) is released, transaction C is successfully written, becomes (1,5,1), and is written to binlog

At this time, the data in the main database is (0,5,0), (1,5,1), (5,5,10)

Therefore, the log written by binlog is:

UPDATE my_test2 SET c=10 WHERE b=5;
UPDATE my_test2 SET b=5 WHERE id = 0;
INSERT INTO my_test2(1,5,1)

After the binglog is executed from the library, the data becomes (0,5,0), (1,5,1), (5,5,10), which perfectly solves the data inconsistency.

Through the analysis of the above two situations, if only the corresponding modified rows are locked, two problems will arise.

1. Destroy the lock statement

2. Data inconsistency

Solutions to phantom reading

Through the above case analysis, even if all records are locked, newly inserted records cannot be blocked. Row locks can only lock rows, but when inserting new records, what needs to be updated is the "gap" between records. Therefore, in order to solve the phantom read problem, InnoDB had to introduce a new lock, that is, gap lock (Gap Lock) .

Gap lock locks the gap between two values. There are 4 pieces of data in the table, so there will be five gaps (-∞, 0), (0, 5), (5, 10), (10, 15), (15, +∞), when scanning to confirm the row to be modified, not only the scanned row must be locked, but also the gaps on both sides must be locked.

Gap locks and row locks are collectively called next-key locks , and each next-key lock is an open and closed interval. Therefore, in the above situation, there will be five adjacent key locks (-∞,0], (0,5], (5,10], (10,15], (15, +∞)

Gap locks can be added by multiple transactions at the same time

There is a difference between gap locks and row locks. Row locks can only be added by one transaction, but gap locks can be added by multiple transactions.

As shown below: open two transactions,

1. Transaction A executes: SELECT * FROM my_test2 WHERE id=2 for UPDATE; which will lock the gap (0,5).

2. Transaction B executes SELECT * FROM my_test2 WHERE id=3 for UPDATE;, which will also lock the gap (0,5) and can succeed.

The gap lock currently protects this gap from inserting data, but they do not conflict.

Locking rules

Principle 1: The basic unit of locking is next-key lock, and next-key lock is the front-open and back-close interval.

Principle 2: Only objects accessed during the search process will be locked.

Optimization 1: For equivalent queries on the index, when locking the unique index, the next-key lock degenerates into a row lock.

Optimization 2: For equivalent queries on the index, when traversing to the right and the last value does not meet the equality condition, the next-key lock degenerates into a gap lock.

A range query on a unique index will access the first value that does not meet the condition

Locking rules—Equal value query gap lock

Transaction A executionUPDATE my_test2 SET b=100 WHERE id =7;

According to principle 1, the locked interval should be (5,10].

According to Optimization 2, this is an equal value query, and id=10 does not meet the query conditions. The next-key lock degenerates into a gap lock, so the final locked range is (5,10).

Therefore: the insertion of transaction B will be blocked, and the update of transaction C can succeed.

Transaction A:

Transaction B:

Transaction C:

Locking rules—non-unique index equivalent query

Transaction A executionSELECT id FROM my_test2 WHERE c=5 lock in share mode``;

According to principle 1, the locked interval should be (0,5]. Since c is not the only index and has to be scanned later, (5,10] will also be locked. According to optimization 2, it will degenerate into (5,10 ). Therefore, the lock interval on index c is (0,10).

Since this query uses index coverage and does not need to search for data in the primary key index , the row with id=5 will not be locked.

So the update will succeed, but the insert will not.

Transaction A executes SELECT * FROM my_test2 WHERE c=5 lock in share mode;

Since querying all the data requires searching for the data with id=5 on the primary key index, according to principle 2, the row of data with id=5 must also be locked, so the update will be blocked.

Note that if the executed statement is SELECT id FROM my_test2 WHERE c=5 for UPDATE; although this statement will also perform index coverage, when using for update mysql will think that you want to update this row next, so by the way, it will give id=5 This line is locked.

Locking rules—non-unique index, equivalent values ​​exist

Insert two new numbers (20,20,5) and (30,30,5)

执行sql: DELETE FROM my_test2 WHERE c=5 LIMIT 2;

According to the locking principle, only the data with c=5 will be scanned, so the locking interval is

(c=0,id=0) to (c=5,id=20)

INSERT INTO my_test2 VALUES(-1,0,0); //No blocking

INSERT INTO my_test2 VALUES(1,0,0); //Blocking

INSERT INTO my_test2 VALUES(19,0,5); //Blocking

INSERT INTO my_test2 VALUES(21,0,5); //No blocking

Execution result verification:

The underlying implementation of the database is extensive and profound. As described in this article, some research and discussion have been conducted based on online scenarios, hoping to provide some enlightenment for related scenarios. There will inevitably be shortcomings in the article, and I hope readers can give you valuable comments and suggestions. Thanks!

Author: JD Logistics Liu Hao

Source: JD Cloud Developer Community Ziyuanqishuo Tech Please indicate the source when reprinting

阿里云严重故障,全线产品受影响(已恢复) 汤不热 (Tumblr) 凉了 俄罗斯操作系统 Aurora OS 5.0 全新 UI 亮相 Delphi 12 & C++ Builder 12、RAD Studio 12 发布 多家互联网公司急招鸿蒙程序员 UNIX 时间即将进入 17 亿纪元(已进入) 美团招兵买马,拟开发鸿蒙系统 App 亚马逊开发基于 Linux 的操作系统,以摆脱 Android 依赖 Linux 上的 .NET 8 独立体积减少 50% FFmpeg 6.1 "Heaviside" 发布
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/10141958