Detailed explanation of the principle of mysql gap lock

Table of contents

I. Introduction

Two, mvcc of mysql

2.1 What is mvcc

2.2 Composition of mvcc

2.2.1 Undo log multi-version chain

2.2.2 ReadView

2.2.3 Snapshot read and current read

3. Transaction issues at the RR level

3.1 Problems solved by RR isolation level

3.1.1 Phantom read problem

3.2 Demonstration of phantom reading effect

3.2.1 Prepare test tables and data

3.2.2 Modify transaction level

3.2.3 Open two sessions and perform transaction operations

3.3 Gap locks solve phantom reading problems

3.3.1 Overview of Gap Lock

3.3.2 Solve the problem of phantom reading based on snapshot reading

3.3.3 The current read is based on the gap lock to solve the phantom read problem

3.4 Does repeatable reading necessarily solve the problem of phantom reading?

3.4.1 Cause Analysis

3.4.2 Summary

Fourth, write at the end of the text


I. Introduction

Lock is an important measure provided by mysql to ensure the isolation of reading and writing of different transactions. The locking mechanism can effectively improve the ability to process transactions concurrently under multi-threading. According to different usage scenarios, mysql has many types of locks. For example, according to the granularity of locks, it can be divided into table locks and row locks. According to the lock status, it can be divided into shared locks and exclusive locks. According to the mode, it can be divided into optimistic locks and pessimistic locks. wait. Different lock divisions correspond to different usage scenarios. At the same time, the use of locks is also closely related to the transaction isolation mechanism of mysql. This article will discuss in depth another kind of lock that is easily overlooked in mysql, that is, gap locks, and related issues. question.

Two, mvcc of mysql

Before officially starting to talk about gap locks, you need to understand the mvcc mechanism of mysql, because the origin of gap locks is closely related to the transactions of mysql, and the underlying control of transactions is guaranteed by the mvcc mechanism of mysql. Following this line of thought, we gradually cleared the fog and moved forward step by step.

2.1 What is mvcc

The full name of mvc is multi-version concurrency control. MVCC realizes the concurrency control of the database through the management of multiple versions of data rows.

Through this technology, it is guaranteed to perform consistent read operations under the transaction isolation level of InnoDB. In other words, it is to query some data rows that are being updated by another transaction, and you can see their values ​​before they are updated, so that you don't have to wait for another transaction to release the lock when doing the query.

2.2 Composition of mvcc

The implementation of mvcc mainly depends on the following three main logic implementations, namely:

  • Hidden fields, as explained above, each data row will have a hidden field;
  • The undolog version chain, as explained above, records the data of the rollback data row;
  • ReadView (read view) is the basis for MVCC to extract data during snapshot read SQL execution. It records and maintains the currently active transaction (uncommitted) id of the system, which may be an array;

The core of MVCC is Undo log multi-version chain + Read view. "MV" is to save the historical version of data through Undo log to realize multi-version management. "CC" is managed through Read-view, and the principle of Read-view is used to determine whether the data is displayed. At the same time, for different isolation levels, the generation strategies of Read view are different, and different isolation levels are realized.

2.2.1 Undo log multi-version chain

The undo log also becomes a rollback log, which is used to record the information before the data is modified. It has two functions: providing rollback (guaranteeing the atomicity of transactions) and MVCC (multi-version concurrency control).

For example, if an update statement is used to modify a piece of data with an id of 1, if the transaction fails to be committed, the data needs to be rolled back. How does the mysql engine know where to roll back? Then use the undo log, which records the data before modification, so it can be used for transaction rollback.

For transactions that operate on one piece of data each time, each piece of data has two hidden fields:

  • trx_id: transaction id, which records the transaction id that updated this data last time;
  • roll_pointer: Rollback pointer, pointing to the previously generated undo log;

As shown in the figure below, it is a schematic diagram of the undo log version chain corresponding to the mysql transaction operation, which records the undo log situation when multiple transactions modify the same data;

It is not difficult to see from the above figure that there may be multiple versions of each piece of data, and different versions are connected through the undo log chain. Through this design, it can be guaranteed that when each transaction is committed, once it needs to be rolled back, the same A transaction can only read values ​​committed earlier than the current version, but cannot see values ​​committed later.

2.2.2 ReadView

Read View is a consistent read view used by InnoDB when implementing MVCC, that is, consistent read view, which is used to support the implementation of RC (Read Committed, read committed) and RR (Repeatable Read, repeatable read) isolation levels.

  • The simple understanding of Read View is to take photos and record the state of the data at a certain moment. Then when the data at a certain moment is obtained later, it will still be the data on the original photo, and it will not change;
  • ReadView (read view) is the basis for MVCC to extract data during snapshot read SQL execution. It records and maintains the currently active transaction (uncommitted) id of the system, which may be an array;

There are 4 more important fields in Read View:

  • m_ids: Used to indicate which transactions in MySQL are being executed, but not submitted;
  • min_trx_id: is the smallest value in m_ids;
  • max_trx_id : the next transaction id value to be generated, that is, the maximum transaction id;
  • creator_trx_id: is the id of your transaction;

As shown in the figure below, several field information related to the state of the current transaction in Read View is recorded, which can be further understood by comparing the explanations of the above fields. For example, when a transaction executes a query for the first time, a consistent view is generated read-view, which saves the information related to the current transaction. When querying again, it will take the latest record from the undo log and compare it with read-view. If it does not meet the comparison rules, it will roll back according to the rollback pointer. A record continues to be compared until a query result that meets the comparison condition is obtained.

How does Read View determine that a certain version of the record is visible? The rules are roughly as follows:

1) If the transaction id of the current record falls in the green part (trx_id < min_id), it means that this version is generated by a committed transaction and is readable;

2) If the transaction id of the current record falls in the red part (trx_id > max_id), it means that this version is generated by a transaction started in the future and cannot be read;

3) If the transaction id of the current record falls in the yellow part (min_id <= trx_id <= max_id), it can be divided into two situations:

  • If the transaction id of the current record is in the array of uncommitted transactions, this record is unreadable;
  • If the transaction id of the current record is not in the array of uncommitted transactions, this record is readable;

In the transaction isolation level of mysql, RC (read committed) and RR (repeatable read) isolation levels are based on MVCC implementation, the difference is:

  • When the RC isolation level is used, read-view is generated every time a select statement is executed;
  • At the RR isolation level, read-view is generated when the select statement is executed for the first time, and all subsequent select statements in the same transaction reuse this read-view;

2.2.3 Snapshot read and current read

snapshot read

Snapshot read, also called consistent read, reads snapshot data. Simple SELECTs without locks are all snapshot reads, that is, non-blocking reads without locks, such as this: SELECT * FROM user WHERE ...

The reason why snapshot reading occurs is based on the consideration of improving concurrency performance. The implementation of snapshot reading is based on MVCC. In many cases, it avoids locking operations and reduces overhead.

currently reading

What is read is the latest version of the record (the latest data, not the data of the historical version). When reading, it is also necessary to ensure that other concurrent transactions cannot modify the current record, and the read record will be locked. Locked SELECT, or adding, deleting, and modifying data will all be read currently. for example:

SELECT * FROM student LOCK IN SHARE MODE; # shared lock

SELECT * FROM student FOR UPDATE; # exclusive lock

3. Transaction issues at the RR level

RR can be repeated reading, that is, the data seen during the execution of a transaction is always consistent with the data seen during the first execution of the transaction. When learning the transaction isolation level of mysql and the problems that each isolation level can solve, do you still remember what problems can be solved under this isolation level? And what remains to ask?

3.1 Problems solved by RR isolation level

The following table lists in detail the problems that can be solved under each transaction isolation level, and the problems that cannot be solved. Compared with the RR isolation level, by default, the RR level can solve the dirty read and non-repeatable read problems, but still Phantom reads are not resolved.

3.1.1 Phantom read problem

Simply put, phantom reading means that when a user reads a range of data rows, another transaction inserts a new row in the range, and when the user reads the data in the range, a new phantom row will be found.

Note that at the repeatable read isolation level, by default, ordinary queries are snapshot reads (subsequent queries always use the snapshot data saved for the first time), so the data inserted by other transactions will not be seen. Therefore, phantom reading will only appear under "current reading" (add for update to the query statement, indicating current reading), many people are easily confused here, and it is also easy to confuse the one-size-fits-all place (interviewers often ask: RR isolation level , will there be a phantom read problem? Therefore, it is necessary to distinguish whether it is a snapshot read or a current read, which will be explained through a case demonstration later);

In MVCC multi-version concurrency control, read operations can be divided into two categories: snapshot read ( Snapshot Read) and current read ( Current Read). The snapshot read and current read have been introduced above, and the main problems they solve are as follows:

snapshot read

Snapshot reading can make ordinary SELECT read data without locking table data, thus solving the following two problems caused by locking database tables:

1) Solve the problem that data cannot be read when data is modified due to locking;

2) Solve the problem that the data cannot be modified when reading the data due to locking

currently reading

The current read is the latest data of the read database. The current read is different from the snapshot read, because the latest data must be read and the isolation of the transaction must be guaranteed, so the current read needs to lock the data ( , for the current 插入/更新/删除操作,属于当前读,需要加锁read select for update)

3.2 Demonstration of phantom reading effect

The following demonstrates the effect of phantom reading based on the read committed transaction isolation level

3.2.1 Prepare test tables and data

Create the following table and insert several pieces of data;

CREATE TABLE `test` (
  `id` int(12) NOT NULL,
  `x` int(12) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

insert into test values(1,3);
insert into test values(2,3);
insert into test values(3,3);
insert into test values(5,3);
insert into test values(17,3);

complete operation steps

order Transaction A Transaction B
1 begin;
2 select * from test where x=3 for update;
3 insert into test values(19,3);
4 select * from test where x=3 for update;
5 commit;

3.2.2 Modify transaction level

Check the current database transaction isolation level, by default, the transaction isolation level is repeatable read;

SELECT @@tx_isolation;

In order to simulate the effect of phantom reading, first manually adjust the transaction isolation level of the session, use the following command to adjust

SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

After the setting is complete, when you query again, you can see that the transaction isolation level becomes read committed;

3.2.3 Open two sessions and perform transaction operations

Execute the following command in the first mysql session window

begin;
select * from test where x=3 for update;

At this time, insert a piece of data in the second session window

Then query the data of x=3 in the first session window, check the data, and find that the data inserted above can be queried;

3.3 Gap locks solve phantom reading problems

3.3.1 Overview of Gap Lock

How do phantom readings come about? The reason for phantom reading is that row locks can only lock rows, but the action of newly inserting records needs to update the "gap" between records. Therefore, in order to solve the phantom reading problem caused by the "repeatable read" isolation level using "current read", the Innodb engine introduces next-key locks, which are a combination of record locks and gap locks.

  • RecordLock lock: A lock that locks a single row record. (Record locks, both RC and RR isolation levels are supported);
  • GapLock lock: Gap lock, which locks the index record gap (excluding the record itself), ensuring that the index record gap remains unchanged. (range lock, RR isolation level support);
  • Next-key Lock: A combination of record lock and gap lock, which locks data at the same time, and locks the range before and after the data. (record lock + range lock, RR isolation level support);

You can compare the following picture to understand the meaning of the above locks

3.3.2 Solve the problem of phantom reading based on snapshot reading

The complete operation steps and sequence are as follows

order transaction 1 transaction 2
1 begin;
2 select * from test where id>1; begin;
3 insert into test values(20,3);
4 commit;
5 select * from test where id>1;
6 commit;

Still using the above table, before starting, adjust the transaction isolation level to repeatable read;

SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT @@tx_isolation;

Open the first session and query the data with id>1

begin;
select * from test where id > 1;

Open a second session and insert a piece of data

begin;
insert into test values(20,3);
commit;

The first session queries the data with id>1 again, and it can be found that the data inserted by the second session is not found in the current session transaction;

Submit the transaction of the first session, query again, and the data can be found at this time

Summarize:

Under the repeatable read isolation level, MVCC is used to avoid phantom reading. The specific implementation method is to generate a Read View (a current snapshot of the database system) in the first select statement after the transaction is started, and each subsequent snapshot read will be Read this Read View.

In the above operation process, a Read View is generated in step 2, so the data read in step 5 is the same as that in step 2, avoiding phantom reading.

3.3.3 The current read is based on the gap lock to solve the phantom read problem

Operations like select lock in share mode (shared lock), select for update; update, insert, and delete are all current reads, and the latest version of the record is read. In the case of current reading, the next-key lock (gap lock) is used to avoid phantom reading, that is, locking blocks the current reading of other transactions.

The operation steps are as follows:

order Transaction A Transaction B
1 begin;
2 select * from test where id>1 for update; begin;
3 insert into test values(20,3);

The first session transaction performs the following operations

begin;
select * from test where id>1 for update;

The second session transaction starts the transaction and inserts a piece of data

begin;
insert into test values(20,3);

It can be seen from the above phenomenon that the second session transaction will be blocked and cannot be inserted successfully;

Transaction A executes the select for update current read in step 2, and will lock the data row record with id>1, and add a gap lock to the range (2,+∞) at the same time, both of which are exclusive locks and will block The current reading of other transactions, so it is blocked when the second transaction inserts new data, thereby avoiding phantom reading in the current reading situation.

3.4 Does repeatable reading necessarily solve the problem of phantom reading?

Mysql's default transaction isolation level (repeatable read) can solve the problem of phantom reading in most scenarios, but it still cannot be completely solved in some scenarios, see the following operation;

order Transaction A Transaction B
1 begin;
2 select * from test where id>1; begin;
3 insert into test values(21,3);
4 commit;
5 select * from test where id>1 for update;
6 commit;

Interested students can follow this step to see the effect, and analyze the above operation:

  • Transaction A uses snapshot read in step 2. At this time, the data queried by Read View is all the data in the range of id>1;
  • Transaction B inserts a piece of data with an id of 21 in step 3, because transaction A does not lock the data, so transaction B can insert normally;
  • In step 5, when transaction A queries, the data inserted by transaction B is found out, so a phantom read occurs;

3.4.1 Cause Analysis

In step 5, for update is used, that is, the current read is used, and the Read View will not be read again, but the latest data is read, so the data inserted by transaction B is read.

3.4.2 Summary

Combined with the above analysis results, the final conclusion is as follows

  • MySQL's default isolation level, repeatable read, largely solves the problem of phantom reading. In the case of snapshot reading, MVCC solves it. When the query is executed for the first time, a Read View is generated, and each subsequent snapshot read reads this view. Read View;
  • In the case of current reading, it is solved by adding a lock. The lock will block the current reading of other transactions, thereby avoiding phantom reading;
  • However, repeatable reading cannot completely solve phantom reading. For example, if a transaction uses snapshot reading and then uses current reading, phantom reading may still occur.

Fourth, write at the end of the text

The transaction isolation level is a very important point in mysql. At the same time, its underlying principle is also something that many developers do not understand well, especially when transactions and locks are combined, it is easy to confuse people, whether it is interviews or It is of great significance to find out the principle in depth, or to troubleshoot production failures, to find out the different transaction isolation levels and the problems that can be solved. This is the end of this article, thank you for watching.

Guess you like

Origin blog.csdn.net/zhangcongyi420/article/details/132415844