[Advanced articles] Detailed explanation of MySQL's MVCC implementation mechanism


insert image description here

0. Preface

insert image description here
In the database field, concurrent operations on data are common requirements. In order to ensure data consistency and transaction isolation, different database systems use different concurrency control techniques. Among them, Multiversion Concurrency Control (MVCC, Multiversion Concurrency Control) is a very important concurrency control technology adopted by the InnoDB storage engine in MySQL.

MVCC allows transactions to access a consistent data view by creating a "snapshot" of data at a certain point in time without being affected by other transaction operations. This can not only improve concurrency performance, but also avoid locking when reading data, which greatly enhances the concurrent processing capability of the database.

In this blog, we discuss in depth the MVCC implementation mechanism of MySQL, including concepts such as Undo log, Read View, and transaction chain. We will explain in detail how MVCC improves the concurrent processing capabilities of the database while ensuring data consistency. Understand and master the working principle of MVCC in MySQL for better database design and optimization.

Although our title is a detailed explanation of MySQL's MVCC implementation mechanism, please don't mistakenly think that the MVCC mechanism is unique to MySQL.

MVCC is not a unique mechanism of MySQL. In addition to MySQL using the MVCC mechanism, other database versions also use it
Here are some databases that use the Multiversion Concurrency Control (MVCC) strategy:

  1. PostgreSQL: It uses MVCC to provide a consistent view across multiple concurrent users.

  2. MySQL: Under the repeatable read isolation level, MySQL's InnoDB storage engine uses MVCC to resolve read and write conflicts, providing snapshot data instead of the latest data.

  3. Oracle: Although Oracle uses MVCC, its implementation is different from PostgreSQL and MySQL's InnoDB. In Oracle, read operations do not block write operations and vice versa.

  4. SQLite: SQLite uses "snapshot isolation". Its core concept is similar to MVCC, which provides a snapshot at the beginning of a transaction instead of real-time data.

  5. CouchDB and MongoDB: These two NoSQL databases also employ MVCC or similar technologies.

  6. Apache HBase: As an open source non-relational distributed database, HBase is the Java implementation of Google BigTable and also uses MVCC.

  7. Apache Cassandra: This is an open source distributed NoSQL database system developed by Facebook to meet high-speed read and write requirements, such as Inbox search, and it also implements MVCC.

  8. MariaDB: As an open source branch version of MySQL, MariaDB's InnoDB storage engine also uses MVCC.

  9. Microsoft SQL Server: SQL Server uses MVCC under the Read Committed Snapshot and Snapshot isolation levels.

1. Basic introduction

1.1. What is MVCC?

MVCC, the full name is Multi-Version Concurrency Control (Multi-Version Concurrency Control), is a method used to solve database concurrency problems. In high concurrency scenarios, in order to improve performance, using MVCC is a more effective way than simple locking.

The working principle of MVCC is that every time the data is updated, the original data will not be directly overwritten, but a new version will be added to the data. Also, each transaction sees a consistent snapshot version that was determined when the transaction started.

In this way, the write operation can be realized without blocking the read operation, that is, the read-write separation is realized, and the concurrent processing capability of the database is improved.

The MVCC implementation mechanism mainly includes the following aspects:

  1. Data versioning: Each piece of data has a version number, and every time the data is modified, a new version will be generated. When querying, you only need to find the corresponding version in the version list, without waiting for the completion of data modification.

  2. Read view: Every time a transaction starts, a read view is generated, which records the transaction IDs of all other transactions that are being executed when the transaction is started. When performing a query operation, only the data version corresponding to the transaction that started before the transaction ID of the read view record is queried.

  3. Undo log: When modifying data, the data version before modification will be written into the undo log. If other transactions need to access the data version before modification, they can be obtained directly from the undo log.

MVCC only works under the two isolation levels of READ COMMITTED and REPEATABLE READ. Under these two isolation levels, read operations do not generate any locks, which greatly improves query performance and reduces the probability of lock conflicts, allowing more applications to run concurrently.

1.1. What are current read and snapshot read?

In the database, read operations are mainly divided into two types: current read (Current Read) and snapshot read (Snapshot Read).

  1. Current read: It is to read the latest version of the record, that is, the latest data. If the data is modified by other transactions during the read process, the latest data content will be read. The current read will lock the read data to prevent other transactions from modifying the data. Current reading mainly occurs in SQL statements that require write operations such as UPDATE, DELETE, INSERT, SELECT...FOR UPDATE.

  2. Snapshot read: read the version recorded at the beginning of the transaction, that is, read the data in the snapshot. Even if the data is modified by other transactions during the reading process, the read data content will not change. Snapshot read does not lock the read data, and does not prevent other transactions from modifying the data. Snapshot reads mainly occur in ordinary SELECT statements.

The main difference between these two read operations is whether to lock the read data and which version of the data is read.

1.1. The relationship between current read, snapshot read and MVCC

Both current read and snapshot read are read operations in MySQL. The difference between them is the data version read and whether it is locked. However, these two read operations cannot fully meet the requirements of concurrency control. This is what MVCC (Multi-Version Concurrency Control) does.

MVCC is a mechanism for implementing transaction concurrency control. In MVCC, every modification to data creates a new version of the data. Different transactions see different versions of the data, depending on when the transaction started and the isolation level of the transaction.

When doing snapshot reads (non-locking reads), MVCC allows transactions to read an older version of the data. This means that at the same point in time, different transactions can see different versions of the same row of data, avoiding blocking caused by waiting for locks. This is especially useful in the case of phantom reads and non-repeatable reads.

When performing a current read (locked read), such as UPDATE or SELECT FOR UPDATE, the latest version of the data is read and locked to prevent other transactions from modifying it.

Therefore, MVCC, current reads, and snapshot reads are closely related. MVCC improves the overall performance of the database by providing a mechanism that enables current reads and snapshot reads to work efficiently at the same time in concurrent transactions.

1.1. What problems can MVCC solve, and what are its benefits?

1.1.1. Improve concurrency performance

MVCC allows multiple read operations and write operations to proceed simultaneously without waiting for locks, thus greatly improving concurrency performance.

In traditional database concurrency control, in order to ensure data consistency, locks are usually used to prevent other transactions from reading or modifying data before the current transaction is completed. Although this method can guarantee data consistency, its concurrency performance is poor because there is blocking between read operations and write operations.

In the MVCC mechanism, for read operations, the system will create a snapshot of the data version instead of directly locking the data, so that even if other transactions are modifying the data, the current transaction can also perform read operations. For example, suppose we have an online shopping system. When user A checks the information of a product, even if the merchant is modifying the price of the product at this time, user A can check the product information normally without being blocked.

For write operations, MVCC avoids directly modifying data by making copies of older versions of the data. In this way, when other transactions need to read data, even if the data has been modified, the data can be obtained by reading the copy of the old version of the data without waiting for the completion of the write operation. For example, in the above-mentioned online shopping system, when a merchant modifies the price of a product, if a user is viewing the product at this time, the user still sees the old price, and there is no need to wait for the merchant to complete the operation of modifying the price.

1.1.2. Avoid deadlocks

Since the read operation does not need to be locked, the possibility of deadlock is reduced.

In traditional locking mechanisms, a deadlock occurs when two or more transactions are waiting for each other's locks. For example, if transaction A has locked resource 1 and is trying to acquire resource 2, and transaction B has locked resource 2 and is trying to acquire resource 1, a deadlock will occur because each transaction is waiting for the other to release the resource it needs.

In MVCC, since read operations do not require locking, transactions can read data without affecting other transactions. This means that read operations will not be blocked waiting for other transactions to release locks, thereby reducing the possibility of deadlocks.

For example, suppose there are two transactions, transaction A and transaction B, both of which need to read and modify the same piece of data. In MVCC, transaction A can first create a snapshot of the data for read operation. At this time, even if transaction B starts to modify this data, the read operation of transaction A will not be blocked. When transaction A wants to perform a write operation, only when the write operation of transaction B has been completed and the transaction has been committed, the write operation of transaction A will be blocked. This greatly reduces the possibility of deadlock.

1.1.3. Solve the problems of dirty read, non-repeatable read and phantom read

MVCC solves transaction isolation problems such as dirty reads, non-repeatable reads, and phantom reads by processing its own snapshot in each transaction and creating new versions of objects when needed.
In the MVCC (multi-version concurrency control) model, each transaction processes a snapshot of the database at a certain point in time, and when multiple transactions process the same data item, a new version of the data item is generated. This mechanism can effectively solve a series of transaction isolation problems:

  1. Dirty read : Dirty read refers to reading data in another uncommitted transaction during one transaction processing. In the MVCC mechanism, each transaction only reads the data snapshot at the beginning of the transaction, and does not read the data of other uncommitted transactions, thus avoiding dirty reading.

  2. Non-repeatable read : Non-repeatable read means that when the same data is read multiple times within a transaction, the results of multiple reads are inconsistent due to the submission of other transactions. MVCC creates a snapshot of the data at the beginning of each transaction and uses this snapshot during the running of the transaction to ensure that reading the same data multiple times in the same transaction always returns the same result, avoiding non-repeatable reads.

  3. Phantom reading : Phantom reading refers to the fact that two query operations are executed successively within a transaction. The second query shows records that do not exist in the first query, or the previous records disappear. This phenomenon is called phantom reading. InnoDB under the Repeatable Read (RR) isolation level can prevent phantom reading through MVCC, and the transaction will see a consistent snapshot, which is created at the beginning of the transaction and will not change during the transaction, thereby avoiding phantom reading .

1.1.4. Implementing non-blocking reads

Under the MVCC model, read operations do not block write operations, and write operations do not block read operations.

1.1.5. Provide a consistent view

With MVCC, each transaction takes a snapshot at its start, and then performs all operations on that snapshot, which guarantees that the data seen during transaction execution is consistent.

2. The realization principle of MVCC

2.1. Implicit fields

During the implementation of MVCC, the InnoDB engine will add additional system hidden fields after each row of data;
According to the description in the official document, there are three fields DB_ROW_ID, DB_TRX_ID, DB_ROLL_PTR. But from some blog posts, there is another field DELETED_BIT. I have no confirmation in the relevant official documentation. If any colleague finds the official description, remember to @ me, thank you very much
I tried to verify it in the official documents and codes of mysql5.7 and 8.0, but failed.

  1. 5.7版本 mvcc 官方文档 https://dev.mysql.com/doc/refman/5.7/en/innodb-multi-versioning.html
  2. 8.0版本 mvcc 官方文档 https://dev.mysql.com/doc/refman/8.0/en/innodb-multi-versioning.html
  3. MySQL source address https://github.com/mysql/mysql-server/tree/8.0
    insert image description here
    insert image description here
  • DB_ROW_ID: 6bytes, implicit auto-increment ID (hidden primary key). If the data table does not have a primary key, InnoDB will automatically DB_ROW_IDgenerate a clustered index.
  • DB_TRX_ID: 6bytes, last modified (modified/inserted) transaction ID. Record the ID of the transaction that created this record or last modified it.
  • DB_ROLL_PTR: 7bytes, rollback pointer. Point to the previous version of this record (stored in Undo Log).
  • DELETED_BIT: 1byte,【I haven't confirmed it yet, I just saw some students explaining it in some blog posts】This field identifies whether the row is deleted. When a record is updated or deleted, it does not mean that it is actually deleted, but that the deletion flag is changed.

In this way, the InnoDB storage engine implements the MVCC function through implicit fields such as DB_TRX_ID, DB_ROLL_PTR, and DB_ROW_ID, combined with Undo Log, while ensuring the ACID nature of transactions (atomicity, consistency, isolation, and durability).

2.1. undo log

Undo logs are an important part of data consistency, transaction rollback, and multi-version concurrency control (MVCC) in MySQL's InnoDB. Undo logs mainly consist of the following parts:

  1. Undo log records: Each Undo log record contains information about a data version. When a transaction modifies a record in the database, InnoDB will generate an Undo log record in the Undo log that contains the information before the modification of the record.

  2. Undo log segment (Undo Log Segment): Undo log records are grouped and stored in the Undo log segment. At the beginning of each transaction, a new Undo log record is created in the Undo segment, which contains the data state at the beginning of the transaction.

  3. Rollback segment (Rollback Segment): The rollback segment is the container of the Undo log segment, and each rollback segment can contain multiple Undo log segments. There are 128 rollback segments by default in InnoDB.

  4. Undo tablespace (Undo Tablespace): The Undo tablespace is the physical space for storing Undo logs, and can store multiple rollback segments.

In InnoDB, the life cycle of the Undo log is from the beginning of the transaction to the end of the transaction. If a transaction fails during execution or is explicitly rolled back, InnoDB will use Undo log records to restore the original state of the data to ensure data consistency. If the transaction executes successfully and commits, the related Undo log records will be marked as recyclable and deleted in subsequent cleanup operations.

The concepts and theories are relatively bitter. Let's take the product table in the mall system as an example.

Scenario example

For example, there is a list of products.
Step 1: A transaction inserts a new record into the commodity table, the record is as follows, the name is radish, the price is 1, the implicit primary key is 1, and the transaction ID and rollback pointer are assumed to be NULL.
Step 2: Now comes a transaction 1 to modify the name of the record and change it to cucumber.
Step 3: There is another transaction 2 to modify the same record in the product table, and change the price to 2.5.

We can speculate that the stored procedure and table content of the undo log change as follows

  1. The initial state of the product table is as follows:
name price implicit primary key Transaction ID rollback pointer
radish 1 1 NULL NULL
  1. Then, transaction 1 made a modification to the name field. At this time, the corresponding undo log records are as follows:
Transaction ID Table Name line ID operation type old value new value
1 Product list 1 renew radish cucumber
  1. The status of the product table becomes:
name price implicit primary key Transaction ID rollback pointer
cucumber 1 1 1 undo log pointing to transaction 1
  1. Next, transaction 2 modifies the price field. The corresponding undo log records are as follows:
Transaction ID Table Name line ID operation type old value new value
2 Product list 1 renew 1 2.5
  1. The status of the product table becomes:
name price implicit primary key Transaction ID rollback pointer
cucumber 2.5 1 2 Point to the undo log of transaction 2

During this process, InnoDB will generate an undo log record for each modification operation. If the transaction needs to be rolled back later, InnoDB can use these undo log records to restore the original state of the data.

2.1. Read View

basic introduction

Read View or read view is a mechanism used to implement consistent non-locking reads (that is, MVCC, read operations in multi-version concurrency control).

When a transaction needs to perform a consistent non-locking read operation, InnoDB generates a read view for the transaction. This read view contains the transaction IDs of all transactions that are currently active. If these transactions have new data changes after the time point when the read view is generated, they will not be seen by the read view. In other words, only the data changed by the transaction that has been committed when the read view is generated will be seen by the read view.

Specifically, when a transaction starts to execute a SELECT operation, if the transaction is under the READ COMMITTED isolation level, InnoDB will generate a new read view for the transaction before each SELECT statement. If the transaction is under the REPEATABLE READ isolation level, InnoDB will only generate a read view at the beginning of the transaction, and then use this read view during the entire transaction.

Scenario example

Suppose we have a simple bank account table with two fields "Account ID" and "Balance":

Account ID balance
1 500
2 1000
3 1500

Now, suppose we have two concurrent transactions, Transaction A and Transaction B.

1. Transaction A starts, it wants to see the balance of account 1 and account 2, so it executes a SELECT statement. Since transaction A is executed under the REPEATABLE READ isolation level, InnoDB generates a read view for it. This read view captures the state of the database at the start of transaction A, i.e. account 1 has a balance of 500 and account 2 has a balance of 1000

2. At this point, transaction B starts, and it remits 100 yuan to account 1, and then submits the transaction. The actual state of the database has changed to account 1 with a balance of 600 and account 2 with a balance of 1000.

3. Transaction A executes the SELECT statement again, and wants to check the balance of account 1 and account 2 again. However, since transaction A executes the SELECT using the read view generated at its start, it still sees account 1 with a balance of 500 and account 2 with a balance of 1000, rather than the actual state of the database. This is how read views implement consistent nonlocking reads.

In this example, although transaction B changes the state of the database during the execution of transaction A, due to the existence of the read view, the data seen by transaction A is still consistent and will not be affected by transaction B. This is how InnoDB implements MVCC by using read views.

principle

A readview snapshot mainly includes the following fields:
insert image description here

  1. m_ids: Active transactions refer to transactions that have not been committed yet.

  2. max_trx_id: For example, the transaction id in m_ids is (1, 2, 3), then the next transaction id that should be allocated is 4, and max_trx_id is 4.

  3. creator_trx_id: The id of the transaction that executes the select read operation.

How does readview determine which version in the version chain is available? (emphasis!)

From top to bottom are (1) (2) (3) (4), explained in turn
insert image description here

trx_id indicates the transaction id to be read

(1) If the transaction id to be read is equal to the transaction id for the read operation, it means that I am reading the records created by myself, so why not.

(2) If the transaction id to be read is smaller than the smallest active transaction id, it means that the transaction to be read has been committed, and it can be read.

(3) max_trx_id indicates the id assigned to the next transaction when readview is generated. If the transaction id to be read is greater than max_trx_id, it means that the id is no longer in the readview version chain, so it cannot be accessed.

(4) The id of the active transaction is stored in m_ids, if the transaction id to be read is not in the active list, then it can be read, otherwise not

In this way, InnoDB can provide each transaction with its own consistent view, so that even if the data in the database changes during the execution of the transaction, the data seen by the transaction will remain consistent and will not be affected by other transactions. Impact. In this way, consistent reading of transactions is realized, which is the so-called snapshot (snapshot) reading, which is the key to realizing MVCC.

2.1. Overall process

MySQL's MVCC multi-version concurrency control mainly involves the following steps:

  1. Transaction start: When a transaction starts and executes the first operation, the system will assign a unique transaction ID to the transaction.

  2. Read operation: When a read operation occurs, InnoDB will create a Read View (read view). This read view records all the transaction IDs that are being executed at startup. During the execution of this transaction, only the modifications made by transactions that have been submitted before the read view is created can be seen. For the modifications made by other transactions after the read view is created, This transaction cannot be seen.

  3. Write operation: When a transaction performs a write operation, InnoDB will not directly overwrite the old data, but will copy the old data and save it in the undo log, and generate a new version of the data. The new data will record the creation of the The transaction ID of the version. At the same time, InnoDB will also insert a new version into the multi-version linked list, and the versions in the linked list are sorted in descending order of transaction ID.

  4. Transaction submission: When a transaction is submitted, the system will delete the ID number of the transaction from the global list of active transactions.

  5. Version recycling: When the system judges that a certain version of data is no longer needed (that is, no active transaction needs to access this version of data), it will recycle this version of data to free up storage space.
    6. The overall execution process is as follows
    insert image description here

3. MVCC-related issues

3.1. How does RR solve non-repeatable read on the basis of RC level?

RC (Read Committed) level and RR (Repeatable Read) level are two common transaction isolation levels.

At the RC level, each read will read the latest data of the row. Therefore, a transaction that executes the same query at different times may get different results. This is the so-called "non-repeatable read".

In order to solve the "non-repeatable read" problem at the RC level, a transaction at the RR level will create a snapshot (snapshot) at the beginning, which is a copy of the data. During the transaction process, even if other transactions modify the data, because each read is the snapshot of the read, so within a transaction, if you read the same data multiple times, you will get the same result.

This is how the RR level solves the non-repeatable read problem on the basis of the RC level. This increases data consistency by sacrificing certain concurrency performance. However, in many scenarios, data consistency is more important than concurrency performance, so the RR level is also widely used.

3.1. What is the difference between InnoDB snapshot reads at the RC and RR levels?

Now that we have understood the difference between RC (Read Committed) and RR (Repeatable Read), what is the difference between InnoDB snapshot reads under these two isolation levels?
The main difference is when snapshot reads are created.

  1. At the RC level, a new Read View (snapshot) is created before each statement is executed, and no matter how many statements are executed by the transaction, as long as it is still executing, a new Read View will continue to be created. Therefore, even in the same transaction, reading the same row of data multiple times may read different versions of the data, that is, "non-repeatable read".

  2. At the RR level, a Read View will only be created when the transaction first executes the first statement, and all subsequent operations in this transaction will only see the data version represented by the Read View. The subsequent modifications made by other transactions are invisible to the current transaction, so in the same transaction, reading the same data multiple times can always read the same result, that is, "repeatable reading" is guaranteed.

  3. The difference between InnoDB snapshot reads at the RC and RR levels is mainly determined by the isolation level characteristics of RC and RR. Compared with the RC level, the RR level provides higher data consistency, but the concurrency performance may decrease.

References

  1. Official documentation: The official MySQL website provides detailed documentation on various storage engines, including InnoDB and MyISAM. https://dev.mysql.com/doc/refman/8.0/en/storage-engines.html

  2. The book "High Performance MySQL" is a very comprehensive book about MySQL performance optimization, architecture design and internal mechanism, which contains a lot of content about storage engines.

  3. Zhihu wrote https://zhuanlan.zhihu.com/p/447372441

Guess you like

Origin blog.csdn.net/wangshuai6707/article/details/132711781