[MySQL study notes (18)] transaction isolation level, MVCC, purge

This article is published by the official account [Developing Pigeon]! Welcome to follow! ! !


Old Rules-Sister Town House:

One. Transaction isolation level

(I. Overview

        Since MySQL is a C/S architecture software, for the same server, there can be multiple clients connected to it. After each client connects to the server, a session is formed, and each client can be in its own session. When sending a request, a request statement may be part of a certain transaction, and the server can process multiple transactions from multiple clients at the same time. Since a transaction corresponds to a state transition in the real world, after the transaction is executed, it must ensure that the data conforms to the rules of the real world. This is the consistency of the transaction.

        When there are multiple transactions that want to access the same data at the same time, we need to use some means to make these transactions execute individually one by one in order, or the final execution result is the same as the individual execution, that is, each transaction is executed in isolation, and each other Non-interference, this is the isolation of affairs. The simplest way is to execute in order, and directly let all transactions be executed in a single thread. This method is called serial execution, which is too inefficient; it can also be restricted to when a transaction accesses a certain data, and other attempts to access the same Data transactions are restricted and queued. This method is called serializable execution.


(2) Consistency of transaction concurrency

        Dirty writes and dirty reads are memorized together. One is to write data modified by uncommitted transactions, and the other is to read; non-repeatable reads and phantom reads are memorized together, both of which modify data read by uncommitted transactions.

1. Dirty writing

        If a transaction modifies data modified by another uncommitted transaction, it is called a dirty write phenomenon. How to understand memory? Dirty represents the data modified by an uncommitted transaction, and writing represents the modification of the data by another transaction. One transaction modifies this data, another data modifies this data, then the final results of the two transactions are different.

2. Dirty Read

        If a transaction reads data modified by another uncommitted transaction, it is called a dirty read phenomenon. The strict interpretation is that the T1 transaction first modifies the value of the data, then the T2 transaction reads the modified value of x for the uncommitted transaction T1, and then T1 aborts the T2 submission, then T2 reads a non-existent value.

3. Non-repeatable reading

        If a transaction modifies the data read by another uncommitted transaction, it means that a non-repeatable read has occurred. That is to say, a transaction reads different results for the same data before and after another transaction is modified.

4. Phantom reading

        If a transaction first queried some records based on certain search conditions, when the transaction was not committed, another transaction wrote some records that meet those search conditions (INSERT, UPDATE, DELETE), it means that a phantom read has occurred operating. Because when the previous transaction was searched again, the result was inconsistent with the first search. Although this concept seems to be similar to non-repeatable reading, phantom reading is for searching records, and non-repeatable reading is for reading records.

(3) The four isolation levels in the sql standard

        The order of the severity of the four consistency problems described above is as follows:

脏写 > 脏读 > 不可重复读 > 幻读

        For the highest isolation method of serial execution, the result is the worst performance. We can give up some isolation in exchange for a part of the performance. This is the original intention of establishing the isolation level. The higher the isolation level, the fewer problems, and the worse the performance; the lower the isolation level, the more problems and the better the performance.

1. Uncommitted Read

        The lowest isolation level may cause the most problems, such as dirty reads, non-repeatable reads, and phantom reads. From the literal meaning, the ability to read the data of uncommitted transactions is dirty read. Even dirty reads cannot solve the problem, let alone the less serious problems later.

2. Submitted

        Solve the problem of dirty reads, non-repeatable reads and phantom reads may occur. From a literal point of view, the ability to read the data of the committed transaction indicates that the dirty read problem is solved, and there are follow-up problems.

3. Repeatable reading

        Solve the problem of dirty reading, repeatable reading, and possible phantom reading.


4. Serializable

        Solve the problem of dirty reading, repeatable reading, and phantom reading. It is executed in a serializable manner, with the strongest isolation and the lowest performance.

        Why isn't every isolation level mentioned about the dirty writing problem? Because dirty writing has too serious an impact on consistency, no matter what isolation level does not allow dirty writing to occur.


(4) Four isolation levels supported in Mysql

        Different database vendors have different support for the four isolation levels in the SQL standard. Although MySQL supports four levels, for repeatable read levels, phantom reads can be largely prohibited. The default level is repeatable reads. .

1. Set the isolation level of the transaction

set global transaction isolation level serializable;

        Put the global keyword after the set keyword, which has an impact on the global scope, and only affects the session after the statement is executed, and it is invalid for the currently existing session.

set session transaction isolation level serializable;

        Put the session keyword after the set keyword, which is valid within the scope of the session, and is valid for all subsequent transactions of the current session.

        If neither keyword is used, it is only valid for the next transaction to be opened in the current session, and subsequent transactions are restored to the previous isolation level.

2. Modify the default isolation level of the server

        Modify the startup item transaction-isolation, or modify the system variable transaction_isolation. For this system variable, there are three scopes, GLOBAL, SESSION and only affect the next transaction.

SET GLOBAL transaction_isolation=…
SET SESSION var_name = xxx
SET var_name = xxx

two. MVCC

(1) Version chain

        For tables that use the InnoDB storage engine, its clustered index records contain two necessary hidden columns. The trx_id attribute represents the transaction id that modifies the record, and roll_pointer points to the undo log where the record was modified last time. , So that all the modified undo logs of the record can be chained into a linked list called a version chain. The head node of the version chain is the latest value of the current record, and the information before the record modification can be found through it.

        In addition, each version node also saves the corresponding transaction id when the version is generated. We will use the version chain of this record to control the behavior of concurrent transactions accessing the same record. This mechanism is called multi-version concurrency control (MVCC).


(二) ReadView

        For different isolation levels, transactions can read different versions of records. The core problem is how to determine which version in the version chain is visible to the current transaction. InnoDB uses ReadView (consistent view) to solve this problem. There are four important contents in the view:

1. m_ids

        When the ReadView is generated, the transaction id list of the active read and write transactions in the current system, that is, the uncommitted transactions.

2. min_trx_id

        When the ReadView is generated, the smallest transaction id among the active read and write transactions in the current system is the smallest value of m_ids.

3. max_trx_id

        When generating ReadView, the system should assign the transaction id of the next transaction.

4. creator_trx_id

        The transaction id of the transaction that generated the ReadView.

(3) Judging the visibility of the current record version according to ReadView

        1. If the trx_id attribute value of the accessed version is the same as the creator_trx_id in ReadView, it means that the current transaction accesses the record it has modified, and all records of this version can be accessed by the current transaction.

        2. If the trx_id attribute of the accessed version is less than the min_trx_id attribute in ReadView, the transaction that generated this version of the table name has been committed before the current transaction, so this version can be accessed by the current transaction.

        3. If the trx_id attribute value of the accessed version is greater than or equal to the max_trx_id value in ReadView, the transaction that generated this version of the table name will be started after the current transaction generates ReadView, so this version cannot be accessed by the current transaction.

        4. If the trx_id attribute value of the accessed version is between min_trx_id and max_trx_id in ReadView, you need to determine whether the trx_id attribute is in the m_ids list. If it is, it means that the transaction that generated this version should be active and not committed, so this version cannot be Visit, otherwise you can visit.

        If the data of a certain version is not visible to the current transaction, follow the version chain to find the data of the next version, and continue to perform the above steps to determine the visibility of the record. If the last version page is not visible, the query result does not include The record.


(4) Timing of generating ReadView

        The submitted read isolation level will generate a ReadView before reading data every time, that is, it will continuously update the ReadView. It will update the transaction active list and other parameters in the ReadView according to the current transaction submission status, which will cause the current The result of each read in the transaction may be different, that is, the problem of non-repeatable reads.

        For the repeatable read isolation level, only a ReadView is generated when the query statement is executed for the first time, and the first ReadView is reused each time, so that the transaction active list and other parameters will not change, then the current transaction The results read from the version chain are all consistent, which solves the problem of non-repeatable reading.


(5) Secondary index and MVCC

        The previous searches are based on the primary key in the clustered index records. If the query statement uses the secondary index to query, how to judge the visibility?

Proceed as follows:

        1. There is an attribute in the Page Header of the secondary index page. Whenever a record in the page is added, deleted or modified, if the transaction id of the transaction performing the operation is greater than the value of the attribute, the value will be updated as the transaction id, which means that this attribute represents the largest transaction id that modifies the secondary index page. When a SELECT statement accesses a secondary index record, first check whether the min_trx_id of the current ReadView is greater than the attribute value of the page. If it is, the transaction corresponding to the attribute has been submitted, and all the records of the page are related to the ReadView is visible, otherwise you have to execute back to the table, and then judge the visibility.

        2. After using the primary key to return to the table, get the corresponding clustered index and find the first visible version according to the previous method, and determine whether the value of the corresponding secondary index column in this version is the same as when the secondary index column is used for query The value is the same. If yes, send the record to the client, otherwise skip it.


three. About purge

        It was said that the insert undo log can be released after the transaction is committed, and the update undo log also needs to support MVCC and cannot be deleted immediately. There is a History linked list node in a group of undo logs written by a transaction. When a transaction is committed, it is A set of update undo logs generated during the execution of this transaction will be inserted into the head of the history linked list. When should the space of these update undo logs be released?

        We should completely delete the update undo log and only the records marked for deletion when appropriate. This operation is called purge. These logs are saved for MVCC, so as long as the earliest ReadView in the system no longer accesses them, their mission is over. As long as we ensure that a transaction has been committed when the ReadView is generated, the ReadView certainly does not need to access the undo log generated during the transaction, because the ReadView can already view the latest version of the transaction's change history.

        When a transaction is committed, a value named transaction no will be generated for the transaction, indicating the order in which the transaction is submitted. The history linked list also arranges the undo logs in the order in which the transaction is submitted. When generating a ReadView, it will also contain more than the current The largest transaction no in the system is also greater than 1. All ReadViews in the system are connected into a linked list according to the creation time. When the purge operation is executed, the earliest ReadView generated in the system is taken out. If there is no ReadView, a new one is created, and then the smaller transaction no value is taken from the History linked list. Group undo logs. If the transaction no value of a group of undo logs is less than the earliest ReadView, their space is released.

        It should be noted here that for the repeatable read isolation level, since the original ReadView will always be reused, and the transaction has been running for a long time and has not been submitted, then the earliest generated ReadView will not be released, and the update undo log in the system will become more and more. Too much, wasting system performance.

Guess you like

Origin blog.csdn.net/Mrwxxxx/article/details/114041800