"MySQL Practical Combat 45 Lectures" - Study Notes 20 "Phantom reading, locking method of full table scan, gap lock, next-key lock"

This article introduces a lock mechanism introduced by MySQL under the repeatability RR isolation level: gap lock (Gap Lock); gap lock is different from transaction-related table locks and row locks. It locks "insert into this gap A record" operation, in addition to this, there is no conflict between gap locks (so deadlock may occur);

Gap locks and row locks are collectively called next-key locks, and each next-key lock is an interval that is opened before and then closed ; if you use a full table scan statement such as select * from t for update (without secondary indexes), you must All records and all gaps in the entire table are locked, which is very expensive; therefore, it is recommended to use the primary key or secondary index as much as possible when updating ;

This article only introduces the concepts of gap lock and next-key lock and the reasons for their introduction, but does not explain the locking rules; refer to the next article for a special introduction to the locking rules;

How to lock "current read" under repeatable read isolation level?

For the convenience of discussing the locking rules, here is a simple table structure description. The table creation and initialization statements are as follows:

# 3个字段的简单表:主键id、索引列c、非索引列d
CREATE TABLE `t` (
  `id` int(11) NOT NULL,
  `c` int(11) DEFAULT NULL,
  `d` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `c` (`c`)
) ENGINE=InnoDB;

# 表中初始化6条数据
insert into t values
(0,0,0),
(5,5,5),
(10,10,10),
(15,15,15),
(20,20,20),
(25,25,25);

(1) Add a read lock to the column of the specified primary key; the SQL statement is as follows:

select * from t where id=1 lock in share mode;

Since there is an index on id, it can directly locate the row id=1, so the read lock is only added to this row; but what about the following SQL statement?

(2) Add a read lock to the specified non-index column; the SQL statement is as follows:

begin;
select * from t where d=5 for update;
commit;

It is easier to understand that this statement will hit the row record (5,5,5) of d=5, and the corresponding primary key id=5, so after the execution of the select statement is completed, a write lock will be added to the row of id=5 , and due to the two-phase lock protocol, the write lock will be released when the commit statement is executed;

Since there is no index on field d, this query statement will perform a full table scan ; then, will other scanned records that do not meet the condition d=5 rows be locked? (The scenarios discussed in this article all use the default transaction isolation level of InnoDB, which can be read repeatedly);

Assumption 1: Assume that only the row with id=5 is locked, while other scanned rows are not locked

Based on the above assumptions (this is an assumption, not the real situation!), try to analyze the following scenario;

Now analyze what results will be returned by the three "current checks" of Q1/Q2/Q3;

(1) Q1 only returns the row with id=5;
(2) At T2, sessionB changes the d value of the row with id=0 to 5, so at T3, what Q2 finds out is id=0 and id=5 Two rows;
(3) At T4, sessionC inserts another row (1,1,5), so at T5, Q3 checks out the three rows with id=0, id=1 and id=5;

Among them, the phenomenon that Q3 reads the line id=1 is called "phantom reading";

Phantom reading means that when a transaction queries the same range twice, the latter query sees "extra" rows that the previous query did not see ;

Under the repeatable read isolation level, ordinary queries are snapshot reads (consistent reads), and will not see data inserted by other transactions; therefore, phantom reads will only appear under "current reads" ;

The modification result of sessionB above is seen by the "current read" of the select statement after sessionA, which cannot be called phantom reading; phantom reading only refers to "newly inserted rows" ;

The three queries of Q1/Q2/Q3 are all added for update, and they are all currently read; the current read rule is to be able to read the latest values ​​of all submitted records; and, the two statements of sessionB and sessionC , it will be submitted after execution, so Q2 and Q3 should be able to see the operation effect of sessionA and sessionB, which is not inconsistent with the visibility rules of transactions;

So the conclusion is - it may lead to phantom reading !

What's wrong with phantom reading?

1. Destroying the semantics of locking

First of all, in terms of semantics, sessionA declared at T1, "I want to lock all rows with d=5, and no other transactions are allowed to read and write operations"; in fact, this semantics is destroyed; sessionB and sessionC By modifying the original data row or inserting a new data row, a new row satisfying d=5 is generated, and these new data rows are not "locked";

2. Destroying the logical consistency of data and logs

Locks are designed to ensure data consistency ; and this consistency is not only the consistency of the internal data state of the database at this moment , but also the logical consistency of data and logs ;

In order to illustrate this problem, add an update statement to sessionA at T1 time, namely: update t set d=100 where d=5; let’s look at the entire execution process;

The process of SQL execution : (there is no lock waiting in the whole process)

(1) After time T1, the row with id=5 becomes (5,5,100), and of course this result is finally submitted officially at time T6; (2) After time T2, the
row with id=0 becomes (0,5 ,5);
(3) After time T4, there is an extra row (1,5,5) in the table;

To sum up, after T6 time, the result of the 3 lines id=0/id=1/id=5 becomes (0,5,5) / (1,,5,5) / (5,5,100);

The binlog is generated at the time of commit (two-phase commit), and the log logic is as follows :

(1) At T2, the sessionB transaction is committed, and two statements are written, and the row with id=0 becomes (0,5,5); (
2) At T4, the sessionC transaction is committed, and two statements are written, and the table There is an extra line (1,5,5);
(3) At T6, the sessionA transaction is submitted, and the statement update t set d=100 where d=5 is written; the value of d in the record satisfying d=5 is all is updated to 100;

To sum up, after T6 time, the result of the 3 lines id=0/id=1/id=5 becomes (0,5,100) / (1,5,100) / (5,5,100);

That is to say, the two rows of id=0 and id=1 have data inconsistencies ! The result recovered from the binlog is inconsistent with the data in the actual table. This problem is serious and cannot be done;

Assumption 2: Add write locks to all rows encountered during scanning

After the above analysis, it can be seen that assuming that the statement 1 "select * from t where d=5 for update" only locks the line d=5, that is, the line with id=5, it is not acceptable and will lead to phantom reading , breaking the semantics of locking the record of d=5, and the generated binlog logic is inconsistent with the actual data;

Now assume that all rows encountered during the scanning process are also write-locked to see if the problem is solved;

Analysis: Since sessionA adds write locks to all rows, sessionB is locked when executing the first update statement; sessionB can continue to execute after sessionA submits at T6; therefore, id=0 The problem of data inconsistency in one line is solved;

However, for the line id=1, the result in the database is (1,5,5), and the execution result according to the binlog is (1,5,100), which means that the problem of phantom reading is still not resolved;

The reason is very simple. At T1, when we lock all rows, the row with id=1 does not exist yet, and if it does not exist, it cannot be locked; that is, even if all current records are locked, It still cannot prevent newly inserted records from being read by "current read" (phantom read) , which is why "phantom read" will be taken out separately to solve;

How to solve phantom reading?

The reason for phantom reading is that the row lock can only lock the current row, but the newly inserted record cannot be locked in the future; in order to solve the problem of phantom reading, InnoDB had to introduce a new lock, that is, the gap lock ( Gap Lock );

Since the newly inserted data in the future can only be inserted into the gap between the primary key id of the current table data (the primary key is unique), the gap lock, as the name implies, locks the gap between two values;

For example, in the table t at the beginning of the article, 6 records are initialized and inserted, which creates 7 gaps, as follows;

In this way, when executing select * from t where d=5 for update, it will not only add row locks to the existing 6 records in the database (6 records will be scanned in the whole table), but also add 7 records at the same time Gap lock; this ensures that no new records can be inserted at present;

That is to say, at this time, in the process of scanning row by row, not only row locks will be added to rows, but also gap locks will be added to the gaps on both sides of rows; that is, data rows are entities that can be locked, and data rows The gap between rows is also an entity that can be locked ;

But gap locks are not the same as the locks we have encountered before; row lock conflict rules are: read and read do not conflict, other read/write read/write conflicts, that is to say, there is a conflict with row locks is "another row lock";

However, gap locks are different from row locks. The conflict between gap locks is the operation "insert a record into this gap"; there is no conflict between gap locks; for example:


Here sessionB will not be blocked; because there is no record of c=7 in table t, sessionA adds a gap lock (5,10); and sessionB also adds a gap lock at this gap; they have a common goal , that is: protect the (5,10) gap, and do not allow insertion of values; but there is no conflict between the two "current read" statements;

Note : (5,10) here refers to the value of c instead of the primary key index; because the c field has an index, the gap before and after the index position where the condition is located is locked; if the query condition does not use the index and scan the entire table, then Add gap locks between all rows in the entire table; detailed rules about locking are introduced in the next chapter;

next-key lock

Gap locks and row locks are called next-key locks; gap locks are recorded as open intervals, so each next-key lock is an interval that is opened before and closed after;

For example, for the above table t, if you execute select * from t for update to lock all records in the entire table (full table scan), 7 next-key locks are formed, which are (-∞,0], ( 0,5], (5,10], (10,15], (15,20], (20,25], (25,+supremum]; (because +∞ is an open interval, InnoDB adds to each index A non-existent maximum supremum is set, so as to meet the aforementioned "next-key locks are all open before and then close intervals")

Concurrency issues caused by gap locks and next-key locks

A certain business logic is like this: lock a row arbitrarily, insert it if it does not exist, and update its data if it exists, the code is as follows:

begin;
select * from t where id=N for update;

/*如果行不存在*/
insert into t values(N,N,N);
/*如果行存在*/
update t set d=N set id=N;

commit;

You may think of a SQL writing method insert ... on duplicate key update; but the official MySQL document states that when there are multiple unique keys, the writing method insert ... on duplicate key update is not safe for concurrency ! And this way of writing is generally not recommended/prohibited in the company's MySQL development manual. For details, please refer to the following article: Mysql deadlock troubleshooting: insert on duplicate deadlock one-time troubleshooting and analysis process ; another way of writing the replace statement is also similar For details, please refer to the detailed explanation of insert...on duplicate key update syntax ;

Going back to this example, the phenomenon is: once this business logic has concurrency, it will encounter deadlock ; you must also find it strange that this logic is locked with for update before each operation, which is already the strictest mode. Will there be a deadlock? Here, use two sessions to simulate concurrency, and assume N=9;

Through the analysis in the figure, so far, the two sessions have entered into a state of waiting for each other, forming a deadlock ; of course, InnoDB's deadlock detection immediately discovered this pair of deadlock relationships, and the rollback cost of the two transactions is the same, so choose to let the insert of sessionA The statement returned an error;

It can be seen that the introduction of gap locks may cause the same statement to lock a larger range, which actually affects the concurrency ;

summary

1. Phantom reading refers to when a transaction queries the same range twice before and after, the latter query sees rows that were not seen in the previous query; phantom reading only refers to "newly inserted rows";

2. Under the repeatable read isolation level, ordinary queries are snapshot reads, and will not see the data inserted by other transactions; therefore, phantom reads will only appear under "current read";

3. Phantom reading will destroy the semantics of locking rows that meet the conditions, because new rows are inserted after locking;

4. Phantom reading will cause the data recovered by the binlog to be inconsistent with the real data, because before the transaction is committed, a new row is inserted, causing the update statement in the committed transaction to affect an unexpected new row;

5. In order to solve the problem of phantom reading, InnoDB gap lock (Gap Lock) locks the gap between two values, which can be the primary key or the secondary index;

6. Unlike row locks, there is no conflict between gap locks. The conflict with gap locks is the operation "insert a record into this gap";

7. Gap locks and row locks are collectively called next-key locks, and each next-key lock is a front-opening and back-closing interval;

8. The introduction of gap locks and next-key locks solves the problem of phantom reading, but at the same time, it may lead to deadlocks in concurrent situations, because the acquisition of gap locks does not conflict and multiple threads hold the same gap lock , but there will be a conflict when performing an insert;

9. In order to solve the problem of phantom reading, the gap lock is introduced, but it also brings the problem of affecting concurrency; if the isolation level is set to read commit, there will be no phantom read and no gap lock, such as using the read commit isolation level Add the combination of binlog_format=row; but you can't use "consistency view" to "update data while backing up";

Read this article in conjunction with the next article;

Next article: "MySQL Practical Combat 45 Lectures" - Study Notes 21 "Locking Rules, Locking Cases, Deadlock Examples"

References for this chapter: 20 | What is phantom reading, and what is wrong with phantom reading? - geek time

Guess you like

Origin blog.csdn.net/minghao0508/article/details/128179144