A soul question: How does MySQL repeatable read solve the phantom read problem?

Phantom read (phantom read) ******** means that two identical queries before and after a transaction produce different result sets, and the latter query sees rows that were not seen in the previous query.

The default transaction isolation level of MySQL InnoDB is repeatable read. The essence of repeatable read is that the same data row is recorded in a transaction whenever the query result is the same.

From the definition, we can know that there is a substantial difference between the problem solved by repeatable reading and the problem of phantom reading. One is for the same row of records, and the other is for the number of data rows. Then, how does MySQL solve the problem of phantom reading? Let's find out, first go to a directory:

1. How MySQL solves phantom reading

1.1 Snapshot read and current read

1.2 How does snapshot read solve phantom read

1.3 How does current reading solve phantom reading

2. Has repeatable reading completely solved phantom reading?

2.1 Little-known phantom reading

3. Conclusion


1. How MySQL solves phantom reading

First of all, our premise is that in the MySQL database, the engine used is the InnoDB engine, and the isolation level of the transaction is repeatable read.

As mentioned in the previous article, MySQL InnoDB relies on MVCC to implement transaction isolation levels. MVCC is also known as multi-version concurrency control. Its full name is Multi-Version Concurrency Control. To put it bluntly, there can be multiple versions of the same record in the system at the same time.

1.1 Snapshot read and current read

Current read: MySQL's MVCC determines that multiple versions of the same data row may exist at the same time. Current read means that the read record is the latest version, and when reading, if there are other concurrent transactions to modify the same data row, The current transaction will block and wait for other transactions by locking.

For example, select lock in share mode (shared lock), select for update, update, insert, delete (exclusive lock) and other operations are all current reads, and these operations will lock the read records.

Snapshot read: Indicates a non-blocking read without lock, like a normal select operation is a snapshot read. The implementation of snapshot reading is based on MVCC, which realizes that the data read at any time in the transaction is the data of a certain historical version, not necessarily the latest data at the current time.

The implementation of MVCC is also a variant of locks, but it avoids locking operations, greatly reduces system overhead, and improves system performance.

It should be noted that the snapshot read will be upgraded to the current read under MySQL's serial isolation level, and even the select operation will be locked.

1.2 How does snapshot read solve phantom read

If we have an account balance table bank_balance, its structure is as follows, and there are 9 initial data rows in it.

CREATE TABLE bank_balance (

  id int NOT NULL AUTO_INCREMENT,

  user_namevarchar(45) NOT NULL COMMENT 'username',

  balanceint NOT NULL DEFAULT '0' COMMENT 'Balance, unit: RMB cents, for example, 100 means RMB 1, the default is 0',

  wealthtinyint NOT NULL DEFAULT '0' COMMENT 'Wealth, 0: poor, 1: rich',

  PRIMARY KEY (id),

  UNIQUE KEY idx_bank_balance_user_name (user_name)

) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Initial data row:

mysql> select *from bank_balance; +----+-----------+-----------+--------+ | id | user_name | balance | wealth | +----+-----------+-----------+--------+ | 1 | 小埃 | 0 | 0 | | 2 | 小克 | 300000000 | 0 | | 3 | Tom | 500 | 0 | | 4 | Eric | 100 | 0 | | 5 | AI | 0 | 0 | | 6 | Alex | 100 | 0 | | 7 | Max | 100 | 0 | | 8 | Mike | 100 | 0 | | 9 | Lyn | 200 | 0 | +----+-----------+-----------+--------+ 9 rows in set (0.01 sec)

Assuming that there are now two transactions, transaction A and transaction B, operating this balance sheet at the same time, the operation timeline of the two transactions is as follows:

Transaction A has two queries, respectively in ③ and ⑤, both of which use the same SQL statement: select * from bank_balance where balance > 0 (ordinary select is a snapshot read), the purpose is to query all rows with balance > 0.

  • ① and ②: Open the transaction.
  • ③: The result obtained by transaction A through select * from where balance > 0 is 7 Rows, as follows:

  • ④: Transaction B inserts a row of records (10, 'Loop', 100,0).

  • ⑤: The result of transaction A's query again through select * from bank_balance where balance>0 is still 7 Rows.

  • ⑥ and ⑦: Submit the transaction.

Why is the result still 7 Rows when querying at point ⑤? Everyone should still remember MVCC, transaction A will generate a ReadView to record the current active transaction at ③, transaction B is within the scope of the active transaction, and at ⑤, the record hidden column transaction id of transaction B insert does not satisfy transaction A Read, transaction A will follow the undo log version chain until it finds a satisfactory record (of course, this record is newly added by transaction B, and you can only find null when you follow the version chain, so the record will not be returned).

1.3 How does current reading solve phantom reading

The above table and query timeline are the same, but the query statement is replaced by the current read query select * from bank_balance where balance > 0 for update. Assuming there is no lock, phantom reading will occur, as follows:

  • ① and ②: Open the transaction.
  • ③: The result obtained by transaction A through select * from bank_balance where balance>0 for update is 7 Rows, as follows:

  • ④: Transaction B inserts a row of records (10, 'Loop', 100,0).
  • ⑤: Transaction A queries again through select * from bank_balance where balance > 0 for update and the result is 8 Rows, as follows:

  • ⑥ and ⑦: Submit the transaction.

Steps ③ and ⑤ also query the records with bank_balance > 0 but the results are different, which is the phenomenon of phantom reading.

In order to solve the problem of phantom reading, the MySQL InnoDB engine introduces next-key lock, which is equivalent to the combination of gap lock + record lock.

Record locks, as the name suggests, are locks added to data rows, so what is a gap lock?

Assuming that there are only records with balance balance>0 and primary key ids 4 and 6 in the bank_balance table, then when a transaction uses select * from where balance>0 for update query, other transactions cannot insert the record with id = 5, just For example, transaction A locks the range (4,6), which is a gap lock.

If the records with id=4 and 6 are also locked at the same time, and the combination becomes a closed interval [4, 6], then the entire interval lock is also called next-key lock.

Still in the above example, transaction B performs insert operation after transaction A query:

After transaction A executes the lock read statement select * from bank_balance where balance > 0 for update at ③, it will lock all the records in the entire table (because there is no index for the balance field), and form multiple records based on the primary key id and table records. A next-key lock, respectively: (-∞, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, 6], (6, 7 ], (7, 8], (8, 9], (9, +∞], each next-key lock is an interval that is opened before and closed after.

Then, transaction B executes the insert statement at ④, and finds that id=10 has been added next-key lock by transaction A, so transaction B will generate a write lock and start blocking and waiting until transaction A commits the transaction. This avoids the phantom read problem mentioned above.

The above example is quite special. If there are only two records in our table, which are (4, 'Eric', 100,0) and (10, 'Loop', 100,0), then when we execute select *from bank_balance When where id > 8 for update, only two next-key locks will be formed, which are (4, 10], (10, +∞], if we execute insert into bank_balance values(5,'MALL',100, 0) will be blocked, but we will not be blocked when we execute insert into bank_balance values(2,'MALL',100,0), because id=2 is not locked.

In particular, the next-key lock is formed based on records, not based on query conditions. Some students asked why the two next-key locks in the above example are not (8, 10], (10, +∞], this is it reason. 

2. Does repeatable reading completely solve phantom reading?

2.1 Little-known phantom reading

The default repeatable read isolation level of MySQL InnoDB plus next-key lock solves the phantom read problem to a certain extent, but phantom read problems still exist under special circumstances.

* In the first case, *Transaction A started first uses snapshot reading, and transaction B started later inserts new data rows and submits them, then transaction A updates them, and then A’s query can check the new data rows of transaction B .

③: There is no record row with id=5 in the table, so the query result of transaction A is 0Rows.

④-⑥: Transaction B starts, inserts a record with id=5, and then submits the transaction.

⑦: Transaction A updates the record with id=5.

⑧: Transaction A queries the record with id=5, and the result is Rows=1, which results in phantom reading.

According to the principle of MVCC, the query result of transaction A at the ⑧ should not return the record with id=5, but because there is an update first, the record is queried out. (This is very convoluted, you need to read this article carefully to understand:

The snapshot read will not be locked, so that transaction B can insert successfully, and the update statement is the current read, which can update the data with id=5. Therefore, when executing ⑧, the snapshot read can query the record with id=5 .

* In the second case, *If the transaction does not use the current read at the beginning, phantom reading will occur when other transactions insert data and commit and then use the current read.

③: There is no record row with id=5 in the table, so the query result of transaction A using the snapshot read method is 0Rows.

④-⑥: Transaction B starts, inserts a record with id=5, and then submits the transaction.

⑦: Transaction A uses the current read method to query the row with id=5, and the result is Rows is 1, resulting in a phantom read.

This is because the snapshot read does not generate a next-key lock, and other transactions can insert the record rows within the query range of this transaction. Therefore, when other transactions insert data and then execute the current read, new records can be found, thus phantom read problem.

Generally, during the development process, it is recommended to use the for update query method as soon as possible when starting a transaction to generate a next-key lock and avoid phantom reading problems.

3. Conclusion

I am tin, an ordinary engineer who is trying to make myself better. I have limited experience and shallow knowledge. If you find something inappropriate in the article, you are very welcome to add me to make a suggestion. I will carefully scrutinize and revise it.

Seeing this, please arrange a "three-in-one" (share, like, and watch) before you go. It is not easy to persist in creating. Your positive feedback is the strongest motivation for me to persist in output, thank you!

Guess you like

Origin blog.csdn.net/wdj_yyds/article/details/131897705