Understanding of InnoDB transaction principles (redo log, undo log, lock, MVCC understanding)

affairs

Transactions are unique to the InnoDB engine in MySQL. A set of SQL statements is regarded as an indivisible whole. The execution of all SQL statements in the whole either succeeds or fails.

The four major characteristics of ACID transactions

  • 原子性 Atomicity: Atom is an indivisible small unit. All SQL in a transaction is regarded as a whole, either all succeed or all fail, which is the smallest unit that cannot be divided.
  • 一致性 Consistency: The data before and after the execution of the transaction must be kept in a consistent state, and the total amount of data remains unchanged. Example: A has 10 yuan and B has 10 yuan. At this time, A wants to transfer 10 yuan to B. At this time, the transaction will be started, A will subtract 10 yuan, B will add 10 yuan, and the transaction will be submitted. After submission, A has 0 yuan and B has 20 yuan. The total amount of money remains the same and does not disappear out of thin air.
    So if the transaction is started, wouldn’t it be inconsistent to directly subtract 10 yuan from A’s account? My understanding: A directly operates a SQL without using the transaction
  • 隔离性 Isolation: The transaction is not affected by concurrent threads, and each thread keeps running in an independent environment. Example: A starts a transaction and changes the data with id 2 to 0. Before the transaction is submitted, other threads cannot operate the data with id 2, so that even under multiple threads, it can still run in an independent environment.
  • 持久性 Durability: Once a transaction is committed or rolled back, it is permanent and unchangeable. For example: A starts a transaction and transfers 10 yuan to B. Once the transaction is submitted, the 10 yuan of this operation will never come back; once the transaction is rolled back, the 10 yuan of this operation will not be transferred again to B.

Transaction-related SQL statements

Once the transaction is opened, there must be a rollback or commit. Even if there is no manual operation, it will be automatically rolled back after the timeout. Opening and submitting `` and rolling back are a set of operations.

  • transaction open begin;orSTART TRANSACTION;
  • transaction commitcommitl
  • transaction rollbackrollback;

The transaction is automatically submitted by default, that is, each SQL is an independent transaction. If you want to make the transaction include a set of SQL, you must enable manual transaction submission

  • Check whether the current session transaction is automatically committed. SELECT @@autocommit The result is 1 automatically committed and 0 manually committed.
  • Set current transaction to manual commitSET @@autocommit = 0 ;

business principle

How Transactions Solve Isolation

If the transaction only has ACD and no isolation I, then it will face thread safety problems.
A database server can be used by multiple connections at the same time, and each connection will create a thread for it in the server, and multiple threads work together. Then there is a problem: how to operate a data between multiple threads? For example: the amount with id=1 is initially 0 yuan, thread A starts the transaction, and adds 10 yuan to the amount with id=1; at this time, thread B starts the transaction and subtracts 20 yuan from the amount with id=1, A submits the transaction, B rolls back the transaction. Ideally, the amount that should be id=1 should be 10 yuan, but it is actually -10 yuan, which is caused by operating the same data between multiple threads 线程安全问题. This example is the most severe of many security issues, for 脏写.
Problems caused by multi-threading: from low to high in severity
It may happen to the same data between threads 写写, 读写, 读读. Reading and reading will not have any security issues, but writing and reading and writing will be divided into corresponding dirty writes and dirty reads

  • 脏写: The above example has been introduced, as long as you want to perform multi-threaded operations, you must solve the dirty write problem.
  • 脏读: A thread reads uncommitted data from another thread. For example, the amount with id=1 is 0 yuan, and user A starts a transaction and transfers 10 yuan to it. At this time, thread B queries id=1 and has 10 yuan, and performs the corresponding operation, and then A rolls back, and the thread with id=1 Returning to 0 yuan, is the operation of thread B wrong?

事务使用锁来解锁多线程安全问题(details will be introduced later)

  • 脏写解决方法:: A thread adds a thread lock when writing. Other threads will check whether there is a lock before operating the data. If there is no lock, then operate. If there is, any operation will wait until the A thread releases the lock . In this way, when thread A is operating, other threads cannot participate and can only proceed 读取, forming an independent operation space for thread A, but there is a dirty read problem.
  • 脏读解决方法:: Based on the previous one, when other threads judge that the data to be operated has a lock, any will wait. In this way, any operation of thread A on this data will not be disturbed by other threads' writing before the transaction is committed, and will not be seen by other threads.
    How do you judge by reading? How to distinguish between reading and writing of threads? Therefore, locks are divided into read-write locks, write locks are exclusive locks, and read locks are shared locks. 在写时上写锁,使得其他线程上读锁和写锁都失败,在读时上读锁,使得其他线程上读锁成功,上写锁失败. That is, 写时不能读,读时不能写,写时不能写,读时可以读。
    the problem caused by using only locks? , if one thread writes, other threads can neither read nor write, and can only wait for the previous lock to be released, which becomes a single-threaded operation.影响线程的并发性

多版本并发控制MVCC的使用: (will be introduced in detail later) MVCC can only improve the efficiency of reading threads.
Every piece of data will have a corresponding historical version after modification. When other threads write, since the data that cannot be currently written can not be read, you can read this The historical version of the data before the current transaction is not committed, which reduces the thread waiting for reading and improves concurrency.
But this brings up 3 problems:

  • 不可重复读:If a thread commits a transaction that modifies the data between the first reading of the data and the second reading of the data, the version of the read data will be inconsistent, and the data will be inconsistent before and after. A and B operate an account with 1,000 yuan at the same time. A opens the transaction to check the account balance is 1,000 yuan, which means that B opened the transaction and recharged the account with 1,000 yuan. At this time, A checks the account balance again and it is still 1,000 yuan. After B submits the transaction, A checks the account for the third time and finds that the balance has changed to 2000. In the same transaction, the same data is queried, and the results are inconsistent. 解决此问题很简,一个事务中只读取一个版本数据即可, but the following problem arises
  • 幻读(只有插入数据时遇到): Although not read, the data is real. Because the read may be the historical version, in order to solve the non-repeatable read, even if the thread currently operating the data commits the transaction, it still cannot be read. But if the lock is successfully obtained, inserting data means inserting the latest data. The latest data may be inconsistent with the historical version, making it feel unreal to read, as if hallucinating. For example: thread A starts a transaction and modifies the name of the user with id=1. This is the user whose id is 1 is read by thread B. It reads the data version before A transaction is submitted. After A finishes modifying the name, it inserts A piece of data with an id of 2 is then submitted. Thread B did not query the data with id 2 because it read the historical version, but when it inserted it, it found that the data with id 2 already existed. Question : What level of lock is used when the B thread queries?
    解决方法: Use gap locks or temporary construction locks to prevent data insertion in the query interval. Since data cannot be inserted, the problem of phantom reading is solved. MySQL在可重复读阶段就通过锁的方式解决了幻读But if the read is locked during readview, what if the main thread has finished adding data? If the main thread is locked, there will be problems if the main thread can do business.

isolation summary

Problems facing isolation: problems caused by thread concurrency and MVCC

  • 脏写: must be resolved

  • 脏读
    insert image description here

  • 不可重复读
    insert image description here

  • 幻读
    insert image description here
    In order to improve the concurrency efficiency, the InnoDB engine makes a trade-off in isolation. The higher the isolation level, the lower the system concurrency. According to different degrees of trade-offs, they are divided into 4 types, sorted from low to high in terms of security.

  • 读未提交 RU: Solve the problem of dirty writing. When a thread DML operates a piece of data, the piece of data is locked so that it cannot be modified, and other threads DML are not allowed to operate, but it can be read. Right now脏读
    insert image description here

  • 读已提交 RC: Solved the problem of dirty read and read uncommitted. When a thread DML operates a certain piece of data, it locks the data so that it cannot be modified or read the current data, only ReadView of MVCC can be read

  • 可重复读 (默认)RR: Only read one version of ReadView to solve the problem of data inconsistency caused by reading different versions

  • 序列化: During the write operation, any read by other threads can only wait.
    insert image description here

How transactions solve atomicity, consistency, and durability

If the data write operation directly operates the disk, random IO may occur, which will cause great performance problems. Therefore, it is first obtained from the memory, manipulated the data in the memory, and finally written to the disk in batches by other threads.
Persistence :
In order to ensure that the dirty pages in memory can be finally written to the disk, it is necessary to prevent possible sudden situations (downtime, power failure, etc.), and redolog logs are required for recording. redologThe log is first written in the memory, and then written to the disk after the transaction is committed (the frequency can be set). The sequential IO speed is greater than the random IO speed of the data. If the transaction has not been flushed to disk after committing the transaction, it is a downtime. After restarting, the content in the redolog will be read and written again to ensure data persistence. If a downtime occurs when the transaction is started but not committed, although the memory data and redolog memory records disappear, it will prompt that the transaction failed to commit, and the disk data remains unchanged, which can still ensure persistence. When the transaction is successfully committed, the records about this transaction in the redolog will be useless and will be overwritten by new records.

Atomicity : When the transaction is successfully committed or the memory data disappears during the transaction, the above situation will occur. If an accident occurs at the code level, the transaction is rolled back or manually rolled back. This means that all operations on the memory data must be traced back before the transaction is started and when it is rolled back. The redolog log only records where and why the data was modified on the disk, but what it looked like before the data in memory. This requires undologa log. Every time a SQL memory data changes, the log will record a contrary SQL to restore the data change. If the transaction is successfully submitted, it will be stored on the disk. If the transaction If you roll back, you will be able to return to the previous data, that is, atomicity.
Consistency : With atomic transactions, all SQL either succeeds or fails. With persistence, all successfully committed transactions will be flushed to disk, and the combination of the two will achieve consistency.

redo log redo log

The role of redolog is to ensure that the data modified in the committed transaction in memory 脏页can be flushed to the log backup of the disk.
The redolog is divided into two parts, one part is in the memory as a cache with a redo log bufferdefault size of 16M, and the other part is a disk file redo logo file, there are two files ib_logfile0and ib_logfile1each 48M, which can be written in a loop

Because if you want to modify the data, you must first load the data into the memory, that is, the InnoDB engine for Buffer Poolmodification, because the disk IO read and write speed is too slow. However, after the transaction is committed, all the data in the memory is refreshed to the disk, and it may also occur that a random IO is required for each minor modification, which is inefficient; if this is not done, the data in the memory is not safe and may In the worst case, when the transaction is started, after the memory modification is completed, the transaction is successfully submitted, but the system crashes suddenly after submission, resulting in the dirty pages in the memory not being flushed to the disk, causing data loss and violating persistence .
insert image description here

InnoDB's solution is: when modifying the data in the memory, write the data modification into the redo log buffer buffer at the same time. After the transaction is committed, the dirty pages in the memory are not refreshed immediately, and the mechanism is handed over to ensure the finality of the data checkpoint. The content in the redo log buffer is written to the disk, because the writing of the redo log disk file is 顺序写入large and the content is small, which is much faster than flushing the data directly to the disk. The transaction is not committed until redo finishes writing to the disk. If an accident occurs during this period, the transaction will directly fail to raise the price and roll back. Make sure that there must be a corresponding redo log backup after the transaction is committed.
When the dirty page data is accidentally lost during the process of refreshing the disk, after the service is restarted, it can be recovered through the disk file redo log. Because the redo log records what data is in which location on the disk, and it is a persistent file. In this way, the problem of memory data insecurity is solved through some redo logs. This technique is also called WAL (Write-Ahead Log)when redolog中记录对应的脏页已经成功刷新到了磁盘则此记录也没有用了,会被新记录覆盖
insert image description here
insert image description here
there are two brushes in this picture :

  1. redo flushes the records in the redo buffer to disk after the transaction is committed
  2. Dirty pages in memory are regularly flushed to disk

The redo log does not have to be flushed after the transaction is submitted, but the frequency of flushing can be configured
insert image description here

  • After the transaction is submitted, the disk is not flushed, and the content in the redo log buffer is flushed to the disk every second by the master thread. This may cause data loss, but the number of IOs is less and the efficiency is higher

  • Disk is flushed when the transaction is committed (default), and disk is also flushed every 1s. Therefore, the disk flushing of redo does not necessarily have to be done when the transaction is committed. During the transaction process, there is an interval of 1s to flush the disk. When the memory usage exceeds half, the disk will also be flushed . This is the safest way
    insert image description here

  • When the transaction is committed, only the contents of the redo log buffer are written into the page cache without synchronization. It is up to the system to decide when to synchronize to disk files. Although this method has higher performance, the data will not disappear after the SQL service is down. However, system downtime, or power failure, etc. will still make the data disappear, and it is still not very reliable.
    The redo log records the physical log : that is, it records the modification of a specific location on the disk, so when data is restored, it is faster than the command because it does not need to be executed.

Advantages of redo log

  • Reduce the refresh frequency of the disk. Without it, dirty pages will be written to disk every time a transaction is committed.
  • Sequentially write to disk, faster flashing
  • It is divided into two parts: memory and disk. The memory part enables continuous recording of transaction execution, and the disk is finally saved.

Since the data is modified in the memory, it is not necessary to refresh to the disk immediately by using redo , but the size of redo is limited, and once it exceeds the backup, it will be overwritten; the memory size is limited. When should the data in memory be flushed to disk?

CheckPoint checkpoint mechanism

Its goal is to flush the dirty pages of memory to disk at a certain time. The data between two checkpoints must be all recorded in redo, otherwise the data backup is incomplete, and the
checkpoint cannot be completely restored when an accident occurs. There are two types

  • Sharp checkpoint全部: Flush in-memory to disk when the database shuts down normally
  • Fuzzy checkpoint: When the database is running, dirty pages are flushed to the disk at different timings 部分, because all refreshes will cause performance problems.
    So when is the timing of refreshing when it is running? There are four trigger timings:
  1. Master Thread Checkpoint异步: Scheduled refresh, refreshed to disk by the Master Thread at a specified interval
  2. FLUSH_LRU_LIST Checkpoint: Use the LUR page replacement algorithm to flush the least recently used pages to disk 内存不足when Buffer Pool is in usepage cleaner
  3. Async/Sync Flush Checkpoint: At redo 日志空间不足the time, page cleanersome dirty pages are flushed to disk, so that the relevant dirty page records recorded in redo can be released and new data can be recorded. But what about synchronous and asynchronous? Depends on the remaining space of the redo log
  4. Dirty Page too much: When the proportion of dirty pages in the memory is too large, the dirty pages are flushed to disk by the Master Thread. Ensure that the Buffer Pool has enough space

In the case of data security, the later it is refreshed to the disk, the higher the performance will be. From this aspect, optimize the SQL service performance, so that checkpoints should be as few as possible to reach the trigger mechanism.
It is necessary to increase the memory space of the Buffer Pool and increase the space of the redo log.
But when flushing the dirty pages in the memory to the disk, is it directly operating the disk corresponding to the data for random IO? No , there is an intermediate buffer

Double Writer double write disk

Why is it called double writing? Because a piece of data in the memory is first 顺序written into the double-write buffer in the disk, after confirming that the writing is successful, the same data in the memory 离散is written into the corresponding table space.

Why is there a double write mechanism?
First, let’s talk about the disadvantages of the double-write mechanism: the same data is written to the disk twice, and an additional IO is performed. Although it is written sequentially, the additional IO reduces the overall performance by 5-10%.
Let's talk about why the double-write mechanism was introduced? The fundamental reason is to solve the problem of page corruption in the process of refreshing the memory data, that is, 部分写问题
the problem of page corruption: the InnoDB engine uses pages as units, and the size of each page is 16K. The operating system (Linux) also uses pages as units, but Its size is 4K, that is to say, when the data in the memory wants to be refreshed to the disk, it must first be handed over to the operating system page, and the operating system page will be refreshed to the disk. This means that every time a page of memory data is refreshed, it must be written into the operating system four times. However, if an accident such as a downtime occurs during these 4 times, an incomplete page in the memory is written to the disk, resulting in page damage ( 部分写问题). At this time, the principle of persistence cannot be violated, so how to recover the data that has not been correctly written to the disk due to downtime and so on?
First of all, what I thought of was to ensure that memory data is written to disk and there is a redo log. But what is recorded in the redo log? It is a physical record, that is, the address data of the xxx page whose offset is xxx is xxxx, but the page of the record that caused the downtime was not written completely, and the page is 已经损坏!damaged, which means that the data record of the corresponding page address is also invalid. So redo can't use
binlog? The function of binlog is data backup and master-slave replication. It does not pay attention to the memory and disk data at all, that is, it cannot distinguish which ones have been written to the disk and which ones are in dirty pages.
If the full recovery is performed, it seems to be ok. But the cost is too high, it is unrealistic.
At this time, the double-write mechanism is introduced, that is, the pages recorded in the redo log are not operated first, so as to prevent the page recorded in the redo from being damaged and invalidated after an accident occurs during the writing process.
So where to write the data first? The place to write must be on the disk, otherwise it is meaningless. Because what we do is interactive backup between memory and disk.
The InnoDB engine gives that the order is first written to the system table space, and then discretely written to the corresponding independent table space after success. InnoDB also optimizes the writing speed of double write, because writing in this way is sequential writing, one more IO operation is added, and the performance loss caused by it is also minimized.
The above is why the introduction双写机制


Double write file
With this mechanism, there must be a corresponding storage file.
The double-write cache has a shared part, part in memory and part in disk. Each part is fixed at 2M.
The double-write buffer is 128 pages on the disk table space, that is, two areas with 1M each, totaling 2M. When Buffer Pool data is copied, the data must first be copied to the memory part of the double-write cache, and then the memory part is divided into two times, and the double-write file of the disk part is first written to the double-write file with a size of 1M each time. After the write is successful, Then discretely write the same data into the corresponding independent table space.


When the memory triggers the checkponit mechanism, it starts writing some pages to disk. The specific process is as follows:

  1. Copy the dirty pages to be written in memory and copy them to double writethe memory part
  2. The double write memory part is written to one area at a time, that is, 1M, and written to the system table space part in two order
  3. After confirming that the writing is successful, the memory part of double write discretely writes the same data into the corresponding independent table space

What if an accident occurs in the double write process?
An accident occurs in the first write: the first time is to write the memory data (double write memory buffer) into the system table file for backup.
If an accident occurs in this process, 但redo记载的页没有发生损坏you can use redo to directly restore
the second Accident in the second write: The second write is to write the memory data (double write memory buffer) into the corresponding independent table space. An
accident in writing to disk will cause the page to be damaged, and the page recorded in redo will also be invalid at this time. But with the file in the system tablespace, when the system is restarting, it will check whether the process page is intact. If the page is found to be damaged, it will be restored from the system tablespace. The
redo page is impossible to have an accident, because only after the redo page is successfully written, The business is considered complete.

insert image description here

undo log rollback log

The redo page can only handle the normal submission of the transaction, and if an exception occurs during the transaction that causes a rollback or manual rollback, the data before the transaction is opened cannot be restored. Because the data in the memory has changed after executing SQL, redo records are all disk data modifications. At this time, a log is needed undo logas a rollback log. After a SQL statement is executed, instructions corresponding to restore the original data must be recorded in the rollback log. For example, insert id=2 data, delete id=2 must be recorded in the undo log. But it's not full SQL, it's like pseudo-SQL. After the rollback is performed, the recovery is realized by performing the reverse operation in the undo log of the transaction. Ensure atomicity and consistency of messages.
The undo log records the logical log, which is to restore the data that has actually changed, and the actual location has changed
insert image description here

So why use logical logs? For the rollback log, if the data is increasing, but the original data does not have this data, the previous situation cannot be recorded. In addition, it may be more convenient to record logical logs in MVCC's ReadView.
undo log log function

  • Undo log can roll back data, and can return to the previous data state by executing reverse SQL
  • The undo log can be used to record the historical version of a piece of data, that is, MVCC, but this is for modification and deletion operations. For the write operation of the record, when the transaction is committed, it will lose its function and will be deleted.

undo storage

  • The undo log is managed by segment, that is 回滚段. Each rollback segment has 1024 rollback pages,
  • Each transaction will only use one rollback segment, but a rollback segment may serve multiple transactions at the same time
  • Each transaction opens a rollback segment

When the transaction is committed, it will be used for MVCC according to the undo record, if not, it will be put into purge for recycling, and it will be judged whether the page is available for the next transaction

Lock

Locks are used for allocation control when multiple threads compete for limited resources. In order to prevent thread safety issues in the database and allow each thread to obtain resources in an orderly manner, locks are required.
It is a kind of mark. Different locks use different marks, and the code corresponds to different processing methods for different marks.
The InnoDB engine is a row-level lock with a finer granularity, which makes the scope of the lock smaller and the amount of concurrency higher when operating data. However, if the corresponding row data cannot be located through the index, a table-level lock will be formed. When a thread operates a row of data, the entire table must be locked, which greatly reduces the concurrency of the system.

According to the function, the lock can be divided into 读锁and 写锁. All thread functions in the database can be divided into sums . There is no problem in reading a piece of data together between multiple threads, but if you read or write a piece of data that is being written, it will happen 脏读. The most serious thing is to write A data being written will be dirty written, resulting in data confusion. Therefore, when reading, add a lock for reading. When reading the lock, make sure that the read data does not have a write lock. This lock is used to prevent writing, but it can be read. Each thread releases the read lock after reading. Add a write lock when writing. When locking, make sure that the data has no read locks and write locks. This lock is used to prevent other threads from writing or reading when writing. When writing locks, it is thus The thread performs the operation and releases the lock when the operation is completed.
That is, it can be read when reading, but cannot be written when writing, cannot be written when reading, and cannot be read when writing.
Through the cooperation of a write lock and a read lock, dirty writing and dirty reading can be solved (locking when reading, locking when locking) There cannot be write locks before and after, and after the lock is added, the write lock cannot be applied).


Now that we know the two types of locks, we can divide them into three types according to the scope of action, page-level locks, table-level locks, and row-level locks. In this category, it can be further subdivided according to specific functions. In the InnoDB engine, row-level locks are obviously the most commonly used.

In fact, locks can also be divided into simple read-write locks (only focus on the concurrency security of data in the table, ignoring table structures such as table read/write locks, row-level read/write locks) and non-simple locks.

table lock

Table read lock, table write lock

When multi-threaded data is read and written, if the engine locks the entire table, the read lock of the entire table will be opened when reading, so that each row of data in the entire table can only be read. But can't write. When inserting data, the entire table cannot be written by other threads, nor can it be read by other threads.
Locked:

  • Table 写锁: lock tables 表名 write: The entire table can only be read but not written by any thread (including itself) before the lock is released. If the lock fails, there may be a write lock on this table
  • Table 读锁: lock tables 表名 read: Only this thread can write and read the entire table, other threads can neither write nor read. If the locking fails, there may be read locks and write locks on the table.

The isolation of the transaction is automatically controlled by the lock, and it has nothing to do with the transaction when you manually lock and release the lock.
There are two ways to unlock, the way to enter the command unlock tables;or directly disconnect the locker

Metadata Lock MDL

The metadata lock is automatically added by the engine to ensure that the table structure cannot be modified when reading and writing data in the table. Avoid conflicts between DDL statements and DML and DQL.
Table-level read locks, write locks, and row-level read locks and write locks will automatically add metadata locks to the table where they are located. Metadata locks are for DDL statements. With MDL locks, no DDL operations can be performed. However, DML locks are transparent to read-write locks for data operations. DML is also a mutex (write lock), and if this lock exists in a table, it cannot be added.
One of the MDL lockers is DML and DQL, and the other is DDL. It is they who form a mutual exclusion relationship through the MDL lock.

intent lock

Intention locks are more like a mark to deal with conflicts between table-level locks and row-level locks. When the table needs to be locked for writing, it is difficult to ensure that the table does not have a table-level write lock or read lock, and that each row of data in the table does not have a read-write lock. Because if you want to know the situation of each row, you need to traverse and check each row, which will affect the efficiency of locking. If each row is locked, an intent lock is automatically applied to the table where it is located, marking that there is a lock in this row, so that when the table performs DDL operations or write locks, it can directly determine whether there is an intent lock.
Similarly, when we want to put a read lock on a table, we allow the table to contain row-level read locks, and no write locks are allowed. When we put a write lock on a table, no row is allowed to contain read locks and write locks.
Therefore, intention locks are also divided into shared locks (read locks) and exclusive locks (write locks). There are only row-level read locks in the row data, and the intent lock is a shared lock, which is mutually exclusive with the table-level write lock; as long as the row data contains a write lock, the intent lock of this table is an exclusive lock, which is different from the table-level read lock. Write locks are mutually exclusive.
Table intent locks represent the mutual exclusion between row-level locks and corresponding table-level locks, eliminating the need to check the locking status of row locks one by one when locking tables, and improving locking efficiency.

row level lock

row read lock, row write lock

The function is the same as the table lock, except that the range is narrowed from the entire table to a certain row of data in the table, so that when operating data, the unlocked part can be operated by other threads, and the overall concurrency is higher, but一定要同过索引来检索数据

gap lock

Pro key lock

MVCC

多版本并发控制(Multi-Version Concurrency Control, MVCC for short), maintains multiple versions of a data. This mechanism can obtain the specific data before each transaction of the current data, that is, the historical version. For example: a new data with id=2 and name='Zhang San' is inserted into the table, and then, when the data with id=2 is modified and its name='Li Si', the original data with name='zs' will be overwritten. But in the undo log, it is recording how to restore the current data to name='Zhang San'. At this time, if the transaction whose name is changed to 'Li Si' is not committed, in the transaction, the data whose name is 'Li Si' cannot be read in the read committed or higher isolation level, that is, it cannot To achieve , in order to reduce the waiting 当前读between threads and improve concurrency, you can read the latest data before this transaction, that is, the data whose name is 'Zhang San'. This way is . The snapshot read solves the blocking read of other threads caused by the current operation thread writing data lock . If it is under the repeatable read isolation level, the read data may not be the latest commit.快照读但写操作依然需要等待

Realization principle

The implementation of MVCC is inseparable . 表中隐藏的两个字段When we create a table, in addition to the fields we display and design, the InnoDB engine will add 2 or 3 hidden fields for us, which are only used internally by the system. Then why the number is not certain, because one of them is often said that if we specify the primary key, the system will not create this field, and there are only two hidden fields at this time. The following is a detailed introduction to these three hidden fields, which can be viewed in the table fileundo logReadView
隐藏字段
rowId

  • DB_TRX_ID: The latest modified transaction ID, the transaction ID of the record inserted into this record or last modified.
  • DB_ROLL_PTR: Rollback pointer, with the undo log pointing to the previous version of this data, this field in the previous version points to the previous version again, forming a version chain.
  • DB_ROW_ID: If the table does not have a primary key, in order to generate a primary key index, the system will automatically create a field for us, and each piece of data will generate a unique and non-empty value as the index key. If a primary key is specified, the field will not be created.

According to the hidden field, we know that each piece of data will record the latest current transaction ID, and its version chain is
used for data recovery because of the undo log, which records that if each piece of data is restored to the logical record after the last committed transaction, the same The previous version also records how to restore to the previous version... In this way, logically, it seems to record the version chain of each piece of data, and each piece of data points to its previous version. The first address of the version chain is stored in the hidden field DB_ROLL_PTR, which connects the version chain with the current data. From the field of the current data DB_ROLL_PTR, you can move along the pointer to find each transaction version of the data.

Only when other threads have just opened a transaction for the current data, the previous data will form a historical version, and the new data (although not submitted) will point to its address in undo. The transaction Id field in the new data records the current open The transaction id. The undo log does not pay attention to the submission of the transaction, but only to the opening and rollback of the transaction.
Demo :

  1. Insert a new datainsert image description here
  2. After the new data is submitted, other threads start the transaction, modify this data, and change the age to 3, but when it has not been submitted or has been submitted
    insert image description here
  3. After this piece of data is continuously modified, more historical version information is formed. The historical version of this piece of data is recorded in the
    insert image description here
    undo log . Each version has the transaction ID and rollback pointer of its own version of data. The data in the historical version is only logical, and what is actually recorded is if such data is restored

Each field has the transaction Id of the latest operation on this data (maybe committed or not committed) and its historical version, but how to judge which version of this data other transactions want to read?

ReadView
readView 读视图is the basis for a certain transaction to execute MVCC, and it is used to judge which version of data should be read

ReadView contains 4 core fields: Remarks: Active transactions: transactions that have been opened but not committed; the transaction Id is self-incrementing

  • m_ids: A collection of currently active transaction Ids
  • min_trx_id: Minimum active transaction id
  • min_trx_id: The current maximum transaction id+1 does not exist, it is just the transaction id of the next transaction to be opened
  • creator_trx_id: To read the transaction id of MVCC, everything is围绕这个id的事务要读取数据

With the transaction id statistics of the table, the historical version chain, and the current transaction id, the rest is the reading rule, that is, which transaction can read which version of data the current transaction id will be read according
to This rule searches for satisfied options one by one. If none are satisfied, compare them one by one with a version
insert image description here

Fields used to judge whether the data is readable m_ids, min_trx_id, max_trx_id, creator_trx_id,DB_TRX_ID

condition Is it readable illustrate
The corresponding transaction id== the transaction id to read in the row yes This piece of data is modified by id and can be read
The corresponding transaction id in the row < the minimum transaction id yes The transaction that operates this piece of data has been committed and can be read
The corresponding transaction id in the row>=maximum transaction id+1 no Each thread will generate ReadView at this time when reading, so the data counted by ReadView may not be timely, but the transaction id in the row data is the latest. If the row transaction id is greater than the maximum value of the statistical transaction, it must also be greater than this thread transaction. That is, before this thread reads, there are other threads that have opened but have manipulated this data, so it cannot be read. There is also a problem, if the transaction has been committed but is still unreadable, but the probability is extremely small
Minimum transaction id <= corresponding transaction id in the row <= maximum transaction id+1 uncertain If the id is not in ids, although the transaction has been opened late, it has been submitted, but if it is in ids, the transaction in this table is currently open and cannot be accessed

The data in each row must be compared one by one, if not satisfied, then compare the next row

The transaction id of the current row data > the maximum transaction id recorded in ReadView
insert image description here

Read the detailed principle of the submitted RC

The read has been submitted. When a thread operates, it will write the lock, and other threads can only execute MVCC. The problem is that it may not be read repeatedly. Because each read generates a ReadView, there is a time interval between two reads, which will cause the data in the ReadView to change. For example, the data being operated satisfies the row transaction id>max_trx_id, this thread cannot read this version of the data, but when reading once, the same id satisfies id<min_trx_id, so this version of the data becomes readable. This makes the inconsistency of the two reads, causing non-repeatable problems.
insert image description here

Repeatable reading RR detailed principle

Non-repeatable reading is because the ReadView generated for each read is different. If the ReadView for the first read is used for each read, the rules for each read are the same, and the data read is the same.

insert image description here
Although the same data is read every time in a transaction, but at the expense of reading the latest submitted data, the latest submitted data is real, but the reading is still in the past, and cannot be used in the write operation MVCC returns to reality, and phantom reading may occur. It has already been introduced above and will not be repeated here.


Non-repeatable reads and phantom reads are generated by using the MVCC mechanism, and non-repeatable reads are at the read committed stage at the isolation level, and phantom reads are at the repeatable read stage at the isolation level, so only in (RC), ( 读已提交阶段RR 读未提交阶段) Only use MVCC


MVCC feels like a parallel space to me. We can only experience it in the parallel space, but we cannot make any modifications. If we modify it, the current space will change. So if you want to modify it, you can only go back to the current space. For example: I deleted the data with id 2 in the historical version, but when the current transaction wanted to modify the data with id 2, I found that it had disappeared.

insert image description here
Similar to this, a square represents a row of data, and each layer represents each version of it, and then determines which version to read according to ReadView.

Guess you like

Origin blog.csdn.net/m0_52889702/article/details/128403946