Does the MySQL repeatable read isolation level solve phantom reading?

Hello everyone, I am Xiaolin.

As I mentioned in the previous article , although the default isolation level of the MySQL InnoDB engine is "repeatable read", it largely avoids the phenomenon of phantom reading (not completely solved), and there are two solutions:

  • For snapshot reads (ordinary select statements), phantom reads are solved through MVCC , because under the repeatable read isolation level, the data seen during transaction execution is always consistent with the data seen when the transaction starts, even if A piece of data is inserted by other transactions in the middle, but this data cannot be queried, so it is very good to avoid the problem of phantom reading.
  • For the current read (select ... for update and other statements), the next-key lock (record lock + gap lock) is used to solve the phantom read , because when the select ... for update statement is executed, the next-key lock will be added, If another transaction inserts a record within the range of the next-key lock, the insert statement will be blocked and cannot be successfully inserted, so it is very good to avoid the problem of phantom reading.

These two solutions solve the phantom reading phenomenon to a large extent, but there are still some cases where the phantom reading phenomenon cannot be solved.

This time, I will talk about this issue with you.

What is a phantom read?

First, let's take a look at how the MySQL documentation defines Phantom Read:

The so-called phantom problem occurs within a transaction when the same query produces different sets of rows at different times. For example, if a SELECT is executed twice, but returns a row the second time that was not returned the first time, the row is a “phantom” row.

Translation: The so-called phantom problem occurs in transactions when the same query produces different result sets at different times. For example, if a SELECT is executed twice, but the second time returns rows that were not returned the first time, that row is a "phantom" row.

For example, suppose a transaction executes the following query statements at T1 and T2 respectively, without executing any other statements:

SELECT * FROM t_test WHERE id > 100;

As long as the result sets generated by the execution of T1 and T2 are different, the problem of phantom reading occurs, such as:

  • The result of execution at T1 time is 5 rows of records, and the result of execution at T2 time is 6 rows of records, then the problem of phantom reading occurs.
  • The result of execution at T1 time is that there are 5 rows of records, while the result of execution at T2 time is that there are 4 rows of records, which is also the problem of phantom reading.

How does snapshot read avoid phantom read?

The repeatable read isolation level is implemented by MVCC (Multi-Version Concurrency Control). The implementation method is that after the transaction is started, after the first query statement is executed, a Read View will be created. Subsequent query statements use this Read View to pass This Read View can find the data at the beginning of the transaction in the undo log version chain, so the data queried each time during the transaction is the same, even if other transactions insert new records in the middle, the data cannot be queried. So it's good to avoid the phantom read problem.

As an experiment, the database table t_stu is as follows, where id is the primary key.

Then under the repeatable read isolation level, the execution order of two transactions is as follows:

insert image description here

From the results of this experiment, we can see that even if a record is inserted in the middle of transaction B, the result sets of the two queries before and after transaction A are the same, and there is no so-called phantom reading phenomenon.

How does current reading avoid phantom reading?

In MySQL, except for ordinary queries, which are snapshot reads, others are current reads , such as update, insert, and delete. These statements will query the latest version of data before execution, and then perform further operations.

This is easy to understand. Suppose you want to update a record, and another transaction has already deleted this record and committed the transaction. Wouldn’t this cause conflicts, so you must know the latest data when updating.

In addition, select ... for updatethis query statement is currently read, and the latest data is read every time it is executed.

Next, we assume that select ... for updatethe current read will not be locked (actually it will be locked), and we are doing another experiment.

At this time, the records inserted by transaction B will be queried by the second query statement of transaction A (because it is the current read), so the result sets of the two queries before and after will be different, and phantom reading will appear.

Therefore, in order to solve the phantom reading problem caused by the use of "current read" in the "repeatable read" isolation level, the Innodb engine introduces gap locks .

Assume that there is a gap lock with a range id of (3, 5) in the table, then other transactions cannot insert the record with id = 4, which effectively prevents the occurrence of phantom reads.

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-IZIjk7EU-1663230383238) (https://cdn.xiaolincoding.com/gh/xiaolincoder/mysql/lock/gap lock.drawio.png)]

To give a specific example, the scenario is as follows:

After transaction A executes this locking read statement, it adds a next-key lock with an id range of (2, +∞] to the records in the table (next-key lock is a combination of gap lock + record lock) .

Then, when transaction B executes the insert statement, it judges that the insertion position is added with a next-key lock by transaction A, so transaction B will generate an insertion intention lock and enter the waiting state at the same time until transaction A commits the transaction. This avoids phantom reading in transaction A due to transaction B inserting new records.

Are phantom reads completely resolved?

Although phantom reading is avoided to a large extent under the repeatable read isolation level, it still cannot completely solve phantom reading .

Let me give an example of a scenario where phantom reading occurs at the repeatable read isolation level.

Or take this table as an example:

Transaction A executes the query id = 5 record, but there is no such record in the table at this time, so the query cannot be found.

# 事务 A
mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t_stu where id = 5;
Empty set (0.01 sec)

Then transaction B inserts a record with id = 5 and commits the transaction.

# 事务 B
mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t_stu values(5, '小美', 18);
Query OK, 1 row affected (0.00 sec)

mysql> commit;
Query OK, 0 rows affected (0.00 sec)

At this time, transaction A updates the record with id = 5. Yes, transaction A cannot see the record with id = 5, but he updates this record. This scene is really inconsistent, and then query id = 5 records, transaction A can see the records inserted by transaction B, and phantom reading occurs in this violation scenario .

# 事务 A
mysql> update t_stu set name = '小林coding' where id = 5;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from t_stu where id = 5;
+----+--------------+------+
| id | name         | age  |
+----+--------------+------+
|  5 | 小林coding   |   18 |
+----+--------------+------+
1 row in set (0.00 sec)

The timing diagram of phantom reading is as follows:

insert image description here

Under the repeatable read isolation level, transaction A generates a ReadView when it executes a normal select statement for the first time, and then transaction B inserts a new record with id = 5 into the table and submits it. Next, transaction A updates the record with id = 5. At this moment, the value of the trx_id hidden column of this new record becomes the transaction id of transaction A, and then transaction A uses the normal select statement to query You can see this record when you read this record, so phantom reading occurs.

Because of the existence of this special phenomenon, we believe that MVCC in MySQL Innodb cannot completely avoid phantom reading .

In addition to the phenomenon of phantom reading in the above scenario, there is also the phenomenon of phantom reading in the following scenario:

  • If transaction A first executes SELECT * FROM t_test WHERE id > 100; (snapshot read) and gets 3 data. After execution, transaction B inserts a data with id=200 and submits it, and transaction A executes SELECT * FROM t_test WHERE id > 100 for update; (current read) to get 4 data, and phantom reading also occurs at this time.

To avoid the phenomenon of phantom reading in this special scenario, try to execute the current read statement such as select ... for update immediately after starting the transaction, because it will add a row level to the record, so as to prevent other transactions from inserting a new Record.

Guess you like

Origin blog.csdn.net/qq_34827674/article/details/126755426