MySQL MVCC

foreword

Now it is known that the four isolation levels of sql are RU, RC, RR and serialization.

And the default isolation level of MySQL that we are familiar with is the third RR (repeatable read). Compared with the SQL standard RR, MySQL's RR is implemented based on the MVCC mechanism. Under this isolation level, dirty writes, dirty reads, non-repeatable reads, and phantom reads can be prevented.

What kind of mechanism is that MVCC?

In fact, MVCC is the abbreviation of multi-version concurrent control (multi-version concurrent control), which is implemented based on the undo log multi-version chain + ReadView mechanism.

Let's talk about the implementation of the MVCC mechanism and the RC and RR based on MVCC.

1. Understand the prelude to mvcc, what is the undo log version chain?

We have also talked about undo log before. Each piece of data has two hidden fields, trx_id and roll_pointer. trx_id is the id of the transaction that updated this data recently, and roll_pointer points to the undo log generated before the transaction was updated.

Suppose transaction A (id=50) inserts a piece of data, and the inserted value is A, then the corresponding trx_id is 50, and the roll_pointer points to an empty undo log, because there was no such data before.

Picture 1-1

Then transaction B came over to update the data, changed the value to the B value, the id of transaction B was 60, and the roll_pointer pointed to the actual undo log rollback log:

Figure 1-2

This undo log records the transaction id of transaction B. The modified value B and roll_pointer point to the undo log before it was modified.

Then transaction C modifies this value to value C, and the corresponding transaction id is 70, as shown in the figure:

Figure 1-3

Here you can see that roll_pointer points to the undo log before this modification, and is also connected to the undo log of transaction A.

Therefore, it can be clearly understood here that the two fields of trx_id and roll_pointer will be updated every time the data is modified. At the same time, the undo logs corresponding to the previous multiple data snapshots will be concatenated through the roll_pointer pointer to form a version chain.

2. What is the ReadView implemented based on the undo log multi-version chain?

When a transaction is executed, a ReadView is generated, which contains several parts:

  • m_ids: Which transactions are executed in MySQL and have not been committed at this time
  • min_trx_id: the smallest value in m_ids
  • max_trx_id: The next transaction id to be generated by MySQL is the maximum transaction id
  • creator_trx_id: the id of the current transaction

Let's understand the usefulness of ReadView through an example.

Assuming that there is already a piece of data in the database, inserted by the previous transaction, its transaction id is 32, and the initial value is the original value.

Figure 2-1

Next, two transactions A and B are executed concurrently. Transaction A has transaction id 45 and transaction B has transaction id 59. Transaction B wants to update this row of data, and transaction A wants to query this row of data.

Figure 2-2

Now transaction A has opened a ReadView. In this ReadView, m_ids contains two ids of transaction A and transaction B, 45 and 59, then min_trx_id is 45, max_trx_id is 60, creator_trx_id is the id 45 of the currently opened transaction, which is the transaction A.

Figure 2-3

Now that transaction A performs the first query, it will first make a judgment to determine whether the trx_id of this row of data is less than the min_trx_id of ReadView. At this time, it is found that trx_id is 32, which is smaller than the min_trx_id (45) in ReadView, indicating that transaction A starts Before, the transaction that modifies this row of data has already been committed, so this row of data can be found.

Figure 2-4

Then transaction B comes to modify this row of data, changing the original value to value B, and then the trx_id of this row of data becomes 59, and the roll_pointer points to the undo log generated before the modification.

Figure 2-5

At this time, transaction A queries again and will find that the trx_id of this row of data is 59, which is greater than min_trx_id (45) in ReadView and less than max_trx_id (60). It means that the transaction that updates this data may exist in the m_ids of ReadView, and then judge whether there is a transaction with trx_id=59 in m_ids. It happens that there is a transaction id of 59 in m_ids, which confirms that it is a transaction executed concurrently with itself in the same period. The data Cannot query.

Figure 2-6

Since this row of data cannot be queried, what data should be returned?

At this time, it will search down the undo log chain of the roll_pointer of this data, and find the nearest undo log with trx_id=32. It means that this undo log version must be committed before transaction A is opened.

Seeing this, you can realize the role of the undo log version chain. By saving the snapshot chain, you can quickly read the previous snapshot value.

Through the above undo log version chain + ReadView, it can be guaranteed that transaction A will not read the value updated by concurrently executed transaction B.

Next, let's say that transaction A has updated this row of data by itself, changed it to value A, updated trx_id to 45, and saved the worth snapshot modified by transaction B before.

Figure 2-7

At this time, transaction A queries this row of data again, and finds that trx_id=45, which is consistent with the creator_trx_id(45) in ReadView, indicating that this row of data has been modified by itself, and of course it can be queried.

Figure 2-8

Then, during the execution of the transaction by transaction A, a transaction C is suddenly opened, the transaction id is 78, and then the data of this row is updated to the value C and submitted.

Figure 2-9

At this time, transaction A goes to query the number of rows, and finds that trx_id=78, which is greater than the max_trx_id in ReadView, indicating that during the execution of transaction A, another transaction has updated the data, so it cannot be queried.

Figure 2-10

Then follow the undo log version chain to find the modified value.

Through the mechanism of undo log version chain + ReadView, we know that after the transaction is opened, only the data modified before the transaction starts or the transaction itself can be read. In this way, data isolation can be achieved when multiple transactions are executed concurrently.

3. RC isolation level based on ReadView mechanism

The committed read isolation level indicates that during the operation of the transaction, other transactions are executed and submitted, and you can access the data updated by other transactions, so there will be problems of non-repeatable reads and phantom reads.

The core of the RC isolation level is that every time a query is initiated, a ReadView is regenerated.

Suppose there are two transactions that agree to a row of data concurrently executed, namely transaction A and transaction B, transaction A is query data, transaction id is 50; transaction B is update data, transaction id is 70.

Now transaction A initiates a query and opens a ReadView. Because transaction B is executed concurrently, the structure in ReadView is:

Figure 3-1

Therefore, no matter how transaction B modifies the data and commits the transaction, transaction A cannot read the value modified by transaction B. The reason is also very simple, the transaction id of transaction B exists in the active transaction list of m_ids of ReadView.

So how can transaction A be able to read the value updated by transaction B and commit the transaction?

That is, every time a query is made, a ReadView is reopened.

Now suppose transaction B has changed the row's data to value B and committed. Transaction A queries again and restarts to generate a ReadView. In the ReadView generated this time, only transaction A is the active transaction in the database.

Figure 3-2

At this time, transaction A judged based on ReadView again and found that the trx_id=70 of this data, although it is between the min_trx_id and max_trx_id range, but not in the m_ids list. It means that transaction B has been committed before generating this ReadView.

Because it is a transaction submitted before the ReadView is generated, it means that transaction A can query the modified value of transaction B. Thereby achieving committed read.

Figure 3-3

So, the key to the committed read isolation level is that each query generates a new ReadView.

4. RR isolation level based on ReadView mechanism

What I will talk about next is the default isolation level in MySQL 可重复读, how to avoid non-repeatable reads and phantom reads at the same time.

Repeatable read isolation level, as the name implies, means that a transaction reads the same data, no matter how many times it is read, it is the same value. Even if other transactions modify the data and commit, it cannot read its value. At the same time, if other transactions insert some new data, it cannot be read, so non-repeatable reads and phantom reads can be avoided.

Assuming that there is already a piece of data in the database, transaction A and transaction B are executed at the same time, the id of transaction A is 60, and the id of transaction B is 70

Pic 4-1

At this point, transaction A initiates a query, and the first query will generate a ReadView

Figure 4-2

Transaction A checks this data based on ReadView, and finds that trx_id=50 is less than min_trx_id, which means that it is inserted by a transaction that has been committed before transaction A is executed, so this data can be found.

Figure 4-3

At this time, transaction B comes to update the row of data, changes the value to value B, generates an undo log, and commits transaction B.

Figure 4-4

But at this time, even if transaction B has been committed, the transaction id of transaction B also exists in m_ids in ReadView, and m_ids records the id of other transactions that are also executing when transaction A is executed. Exist in m_ids, unless a new ReadView is generated with the same isolation level as RC.

So when transaction A queries this row of data again, because there is the transaction id of transaction B in the m_ids list, it means that transaction B is also an active transaction of the database. The log version chain finds the corresponding value.

Figure 4-5

Seeing this, you can clearly understand how to avoid the problem of non-repeatable reading through ReadView.

What if it is a phantom read that may be caused by inserting data?

Assuming that transaction A uses "select * from table where id > 10" to query first, the only data that may be found at this time is 原始值this data.

Figure 4-6

Now there is a transaction C that inserts a piece of data and then commits.

Figure 4-7

Then transaction A queries again, and you will find that there are two days of eligible data, one is the original value, and the other is the value C.

However, according to the judgment of ReadView, it is found that the trx_id=80 of the value C is larger than the max_trx_id (71). It means that the transaction is opened after the query is initiated by itself, so this data cannot be queried at this time.

Figure 4-8

Therefore, in this query, transaction A can only query one piece of data.

In this way, under the mechanism of relying on ReadView, transaction A will not occur 幻读.

Seeing this, I believe everyone will understand how the RR isolation level avoids non-repeatable reads and phantom reads based on the ReadView mechanism.

Summarize

Through the analysis of a series of chapters and underlying principles, everyone understands how the problems of dirty writing, dirty reading, non-repeatable reading and phantom reading of the database arise.

And how does MySQL implement the RR isolation level based on the undo log multi-version chain + ReadView mechanism to avoid the problems of dirty writes, dirty reads, non-repeatable reads and phantom reads.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324146357&siteId=291194637