MySQL study notes (19) - MySQL transaction log

1 Introduction

  • Transactions have four characteristics: atomicity, consistency, isolation, and durability . So what mechanism are the four characteristics of transactions based on?
  • The == isolation of transactions is realized by the lock mechanism ==.
  • The atomicity, consistency and durability of == transactions are guaranteed by the transaction's redo log and undo log ==.
    • REDO LOG is called a redo log , which provides rewrite operations and restores page operations modified by committed transactions.Used to ensure the durability of transactions
    • UNDO LOG is called a rollback log , and the rollback line is recorded to a specific version.Used to ensure the atomicity and consistency of transactions
  • Some DBAs may think that UNDO is the reverse process of REDO, but it is not. Both REDO and UNDO can be regarded as a recovery operation , but
    • redo log : It is a log generated by the storage engine layer (innodb) , which records page modification operations on the " physical level "
      • For example, page number xxx and offset yyy have written 'zzz' data. Mainly to ensure the reliability of the data;
    • undo log: It is the log generated by the storage engine layer (innodb) , which records the logical operation log
      • For example, if an INSERT statement operation is performed on a row of data, the undo log will record a DELETE operation opposite to it.
      • It is mainly used for transaction rollback (the undo log records the inverse operation of each modification operation ) and consistent non-locking read (undo log rollback rows are recorded to a specific version—MVCC, that is, multi-version concurrency control).

2. redo log

The InnoDB storage engine manages storage space in units of pages. Before actually accessing the page, the page on the disk needs to be cached in the Buffer Pool in the memory before it can be accessed. All changes must first update the data in the buffer pool , and then the dirty pages in the buffer pool will be flushed to the disk at a certain frequency ( checkpoint mechanism ), and the buffer pool is used to optimize the gap between the CPU and the disk, so that you can Ensure that the overall performance does not drop too quickly.

Why do you need REDO logs

In the database management system, the REDO log is a mechanism used to ensure the persistence of data . When the database performs write operations, the REDO log will record the detailed information of these operations to ensure that even if abnormal conditions such as system crashes occur, the database can correctly restore these operations during recovery.

Specifically, when the database performs write operations, it first writes these operations to the REDO log, and then writes them to disk . This means that even if a system crash occurs before writing to disk, the database can use the information in the REDO log to resume write operations.

In addition, REDO logs can also be used to achieve transaction durability and consistency. Before the transaction is committed, all modification operations will be recorded in the REDO log and then written to disk. If an abnormal situation such as a system crash occurs before the transaction is committed, the database can use the REDO log to restore the uncommitted transaction to ensure data consistency and integrity.

Therefore, the REDO log is very important to the security and reliability of the database and is an integral part of the database system.

The benefits and characteristics of REDO logs

benefit

  • The redo log reduces the frequency of flushing
  • The redo log takes up very little space

To store the tablespace ID, page number, offset, and the value that needs to be updated, the required storage space is very small, and the disk is quickly refreshed.

features

  • Redo logs are written to disk sequentially
    • In the process of executing a transaction, each time a statement is executed, several redo logs may be generated. These logs are written to the disk in the order they are generated , that is, sequential IO is used, and the efficiency is faster than random IO.
  • During transaction execution, the redo log keeps recording
    • The difference between redo log and bin log is that redo log is generated by the storage engine layer, while bin log is generated by the database layer .
    • Assume that a transaction inserts 100,000 rows of records into the table. During this process, it is continuously recorded in the redo log sequentially, but the bin log will not record. It will not be written to the bin log file until the transaction is committed. .

The composition of redo

Redo log can be simply divided into the following two parts:

  • Redo log buffer (redo log buffer) , stored in memory , is volatile.

    • Parameter setting: innodb_log_buffer_size:

      • Redo log buffer size, the default is 16M, the maximum value is 4096M, and the minimum value is 1M.
      mysql> show variables like '%innodb_log_buffer_size%';
      +------------------------+----------+
      | Variable_name | Value |
      +------------------------+----------+
      | innodb_log_buffer_size | 16777216 |
      +------------------------+----------+
      
  • Redo log files (redo log file) , stored in the hard disk , are persistent.

The REDO log files are shown in the figure, and ib_logfile0 and ib_logfile1 are the REDO logs.

image-20230324202944695

The overall process of redo

Taking an update transaction as an example, the redo log flow process is shown in the following figure:

image-20230324203215616

  • Step 1: First read the original data from the disk into the memory, and modify the memory copy of the data
  • Step 2: Generate a redo log and write it to the redo log buffer, which records the modified value of the data
  • Step 3: When the transaction commits, refresh the content in the redo log buffer to the redo log file, and write additionally to the redo log file
  • Step 4: Periodically refresh the modified data in memory to disk

Redo log flushing strategy

The writing of the redo log is not directly written to the disk. The InnoDB engine will first write the redo log buffer when writing the redo log , and then flush it to the real redo log file at a certain frequency.

image-20230324205058590

Note that the process of flushing the redo log buffer to the redo log file is not actually flushing to the disk.Just flash into the file system cache (page cache)(This is an optimization made by modern operating systems to improve file writing efficiency), and the real writing will be handed over toThe system decides by itself(For example, the page cache is large enough) . Then there is a problem for InnoDB. If it is handed over to the system for synchronization, if the system goes down, the data will also be lost (although the probability of the whole system going down is still relatively small).

In response to this situation, InnoDB gives the innodb_flush_log_at_trx_commit parameter, which controls how to flush the logs in the redo log buffer to the redo log file when the commit commits the transaction . It supports three strategies:

  • 0: means **When the transaction is committed, the Redo Log is not flushed to the disk, but the unflushed Redo Log is flushed to the disk in batches every second**. This is == the most performant flush strategy ==, but if the MySQL process crashes or the machine loses power, you may lose a second of data.

    image-20230324205703774

  • 1 ( default ):express**Write Redo Log to disk synchronously every time a transaction is committed. This is the safest refresh strategy**, but the performance is poor, because the Redo Log needs to be written to disk when waiting for each transaction to commit.

    image-20230324205723417

  • 2: means **Write the Redo Log to the system cache each time a transaction is committed(not the disk), but the system will flush the Redo Log in the cache to the disk in batches every second**. This is a balance between performance and security . It can ensure that the Redo Log is flushed to the disk at least every second, and at the same time avoids the performance loss of writing to the disk when each transaction is committed.

    image-20230324205812517

Note: The background thread will write the contents of the redo log buffer to the page cache every second , and then call fsync to flush the disk.

3. Undo log

Redo log is a guarantee of transaction persistence, and undo log is a guarantee of transaction atomicity. The pre- operation of updating data in a transaction is actually to write an undo log first .

How to understand Undo logs

Transactions need to guarantee atomicity , that is, the operations in a transaction are either completed or nothing is done. But sometimes there will be some situations in the middle of the execution of the transaction, such as:

  • Situation 1: Various errors may be encountered during transaction execution, such as errors of the server itself, errors of the operating system , or even errors caused by sudden power failure.

  • Case 2: Programmers can manually enter the ROLLBACK statement during transaction execution to end the execution of the current transaction.

When the above situation occurs, we need to change the data back to the original state. This process is calledrollback, so that a false impression can be created: this transaction does not seem to do anything, so it meets the atomicity requirements.

Whenever we want to make changes to a record ( the changes here can refer to INSERT, DELETE, UPDATE ), we need to "keep a hand" - write down what is needed for rollback. For example:

  • When you insert a record , at least write down the primary key value of this record, and then when you roll back, you only need to delete the record corresponding to the primary key value . ( For each INSERT, the InnoDB storage engine will complete a DELETE )
  • If you delete a record , you must at least write down the contents of this record, so that when you roll back later, you can insert a record consisting of these contents into the table. ( For each DELETE, the InnoDB storage engine will perform an INSERT )
  • If you modify a record , you must at least record the old value before modifying this record, so that you can update this record to the old value when you roll back later. ( For each UPDATE, the InnoDB storage engine will perform a reverse UPDATE, putting back the row before modification )

MysQL calls these contents recorded for rollback **Undo log or rollback log (ie undo log)**。

Note that since the query operation ( SELECT ) does not modify any user records, it is not necessary to record the corresponding undo log when the query operation is executed .

In addition, undo log will generate redo log , that is, the generation of undo log will be accompanied by the generation of redo log, becauseUndo log also needs persistent protection

The role of undo log

  • Function 1:rollback data
    • undo is a logical log, so it just puts the databasereturn logically to the original. All modifications are logically undone, but the data structures and the page itself may be very different after the rollback.Does not physically restore the database to what it was before the statement or transaction was executed.
  • Function 2:MVCC
    • Another role of undo is MVCC, that is, the implementation of MVCC in the InnoDB storage engine is done through undo. When a user reads a row of records, if the record is already occupied by other transactions, the current transaction can read the previous row version information through undo to achieve non-locking reads.

undo storage structure

Rollback segment and undo page

InnoDB uses a segmented approach to undo log management, which is the rollback segment (rollback segment) . Each rollback segment records 1024 undo log segments , and applies for undo pages in each undo log segment . Undo page is a memory cache area used to store modified data during transaction processing.

  • Before InnoDB1.1 version (excluding 1.1 version), there is only one rollback segment, so the transaction limit supported at the same time is 1024 . Although it is sufficient for most applications.
  • Starting from version 1.1, InnoDB supports a maximum of 128 rollback segments , so the limit of concurrent online transactions is increased to 128*1024 .
mysql> show variables like 'innodb_undo_logs '; # 可以看出 value 为 128

Although InnoDB1.1 supports 128 rollback segments, these rollback segments are all stored in the shared tablespace ibdata. Starting from InnoDB1.2, the rollback segment can be further set through parameters. These parameters include:

  • innodb_undo_directory : Set the path where the rollback segment file is located. This means that the rollback segment can be stored in a location other than the shared tablespace, that is, it can be set as an independent tablespace. The default value of this parameter is ".times", indicating the directory of the current InnoDB storage engine.
  • Innodb_undo_logs : Set the number of rollback segments, the default value is 128. In InnpDB1.2 version, this parameter is used to replace the parameter innodb_rollback_segments of the previous version.
  • innodb_undo_tablespaces : Set the number of files that make up the rollback segment, so that the rollback segment can be more evenly distributed among multiple files. After setting this parameter, you will see a file prefixed with undo in the path innodb_undo_directory, which represents the rollback segment file.

Undo log related parameters are generally rarely changed .

Reuse of undo pages

When we start a transaction and need to write the undo log, we must first go to the undo log segment to find a free position. When there is space, apply for the undo page, and perform undo log on the applied undo page write. We know that the default size of a page in mysql is 16k.

It is very wasteful to allocate a page for each transaction (unless your transaction is very long), assuming that the TPS (number of transactions processed per second) of your application is 1000, then 1s needs 1000 pages, about 16M For storage, about 1G of storage is required for 1 minute. If this continues, unless MySQL cleans up very diligently, the disk space will grow very fast over time, and a lot of space will be wasted.

So the undo page is designed to be reusable . When the transaction is committed, the undo page will not be deleted immediately. Because of reuse, this undo page may be mixed with undo logs of other transactions. After the undo log is committed, it will be put into a linked list , and then judge whether the used space of the undo page is less than 3/4. If it is less than 3/4, it means that the current undo page can be reused , so it will not be used Recycling, the undo log of other transactions can be recorded behind the current undo page. Since the undo log is discrete , it is not efficient to clean up the corresponding disk space.

Rollback segments and transactions

  1. Each transaction will only use one rollback segment, and one rollback segment may serve multiple transactions at the same time.

  2. When a transaction starts, a rollback segment is created. During the transaction, when the data is modified, the original data will be copied to the rollback segment.

  3. In the rollback segment, the transaction will continue to fill the extent until the end of the transaction or all the space is used up. If the current extent is not enough, the transaction will request the expansion of the next extent in the segment. If all the allocated extents are used up, the transaction will overwrite the original extent or extend the new extent if the rollback segment allows it. panel to use.

  4. The rollback segment exists in the undo tablespace. There can be multiple undo tablespaces in the database, but only one undo tablespace can be used at a time.

mysql> show variables like 'innodb_undo_tablespaces ' ;

#undo log的数量,最少为2,undo log的truncate操作有purge协调线程发起。在truncate某个undo log表空间的过程中,保证有一个可用的undo log可用。
  1. When a transaction is committed, the InnoDB storage engine does the following two things:
    • Put the undo log into the list for later purge operation
      • The purge operation is actually to support MySQL's MVCC, so the record cannot be processed immediately when the transaction is committed, because other transactions may be using this row, so the InnoDB storage engine needs to save the previous version of the record.

      • If the row record is no longer referenced by any other transaction, then the real delete operation can be performed, so it can be understood that the purge operation is not a delete operation for processing the current transaction, but to clear the previous delete and update operations, and the actual The operation performed is only delete, and the version recorded in the previous line is cleaned up

    • Determine whether the page where the undo log is located can be reused, if it can be allocated to the next transaction

Data Classification in Rollback Segments

  1. Uncommitted rollback data (uncommitted undo information) : The transaction associated with this data has not been committed, which is used to achieve read consistency. So this data cannot be overwritten by data from other transactions.

  2. Committed undo information that has been committed but not expired : the transaction associated with this data has been committed, but it is still affected by the hold time of the undo retention parameter.

  3. Transaction has been submitted and expired data (expired undo information) : The transaction has been submitted, and the transaction storage time has exceeded the time specified by the undo retention parameter, which belongs to expired data. When the rollback segment is full, it will preferentially overwrite the "transaction committed and expired data".

After the transaction is committed, the undo log and the page where the undo log is located cannot be deleted immediately. This is because there may be other transactions that need to get the previous version of the row record through the undo log. Put the undo log into a linked list when the transaction is submitted. Whether the undo log can be finally deleted and the page where the undo log is located is judged by the purge thread.

type of undo

  • insert undo log
    • The insert undo log refers to the undo log generated during the insert operation. Because the record of the insert operation is only visible to the transaction itself and not to other transactions (this is the requirement of transaction isolation), the undo log can be deleted directly after the transaction is committed. No purge operation is required .
  • update undo log
    • The update undo log records the undo log generated for delete and update operations. The undo log may need to provide an MVCC mechanism , so it cannot be deleted when the transaction is committed . Put it into the undo log linked list when submitting, and wait for the purge thread to perform the final deletion .

The life cycle of undo log

brief generation process

The following is the simplified process of undo+redo transactions.
Assume that there are 2 values, A=1 and B=2, and then modify A to 3 and B to 4

1. start transaction ;
2.记录 A=1到undo log;
3. update A = 3;
4.记录A=3 到redo log;
5.记录B=2到undo log;
6. update B = 4;
7.记录B =4到redo log;
8.将redo log刷新到磁盘;
9. commit
  • If the system is down in any of steps 1-8 and the transaction is not committed, the transaction will not have any impact on the data on the disk.
  • If there is a downtime between 8-9, you can choose to roll back after recovery, or you can choose to continue to complete the transaction commit, because the redo log has been persisted at this time.
  • If the system goes down after 9, and the data changed in the memory map is too late to be flushed back to the disk, then after the system is restored, the data can be flushed back to the disk according to the redo log.

Only the process of Buffer Pool:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-tyHXqwlM-1680921939349)(C:/Users/dell/AppData/Roaming/Typora/typora-user-images/ image-20230327193914352.png)]

redo + undo

image-20230327193950114

Before updating the data in the Buffer Pool, we need to first write the state of the data transaction to the Undo Log . Assuming that an error occurs in the middle of the update, we can use Undo Log to roll back to before the transaction started.

Detailed generation process

For the InnoDB engine, in addition to the data of the record itself, each row record has several hidden columns:

  • DB_ROW_ID : If no primary key is explicitly defined for the table, and no unique index is defined in the table, InnoDB will automatically add a hidden row_id column as the primary key for the table.
  • DB_TRX_ID : Each transaction will be assigned a transaction ID. When a record is changed, the transaction ID of this transaction will be written into trx_id.
  • DB_ROLL_PTR : The rollback pointer is essentially a pointer to the undo log.

image-20230327194457270

When performing an INSERT

begin;
INSERT INTO user (name) VALUES ("tom");

Inserted data will generate an insert undo log, and the rollback pointer of the data will point to it. The undo log will record the serial number of the undo log, the column and value inserted into the primary key... , then when performing rollback, the corresponding data can be directly deleted through the primary key.

image-20230327194756686

When performing an UPDATE

For the update operation, an update undo log will be generated, and it will be divided into those that update the primary key and those that do not update the primary key. Assuming that it is executed now:

UPDATE user SET name="Sun" WHERE id=1;

image-20230327194904935

At this time, the old records will be written into the new undo log, and the rollback pointer will point to the new undo log. Its undo no is 1, and the new undo log will point to the old undo log (undo no=0).

Suppose now to execute:

UPDATE user SET id=2 WHERE id=1 ;

image-20230327195048245

**For the operation of updating the primary key, the original data deletemark flag will be turned on first. At this time, there is no real deletion of data. **The real deletion will be handed over to the cleaning thread for judgment, and then a new piece of data will be inserted later . The data of the undo log will also be generated, and the serial number of the undo log will be incremented.

It can be found that an undo log will be generated every time the data is changed. When a record is changed multiple times, multiple undo logs will be generated. The undo log records the log before the change, and the serial number of each undo log It is incremental, so when you want to roll back, push forward in sequence according to the serial number, and you can find our original data.

How to undo log rollback

Taking the above example as an example, assuming that rollback is executed, the corresponding process should be as follows:

  1. Delete the data with id=2 through the log of undo no=3

  2. Restore the deletemark of the data with id=1 to 0 through the undo no=2 log

  3. Restore the name of the data with id=1 to Tom through the undo no=1 log

  4. Delete the data with id=1 through undo no=0 log

Undo log deletion

  • For insert undo log

Because the record of the insert operation is only visible to the transaction itself, not to other transactions. Therefore, the undo log can be deleted directly after the transaction is committed without purge operation.

  • For update undo log

The undo log may need to provide an MVCC mechanism, so it cannot be deleted when the transaction is committed. Put it into the undo log linked list when submitting, and wait for the purge thread to perform the final deletion.

The two main functions of the purge thread are:

Clean up the undo page and clear the data rows marked with Delete_Bit in the page . In InnoDB, the Delete operation in a transaction does not actually delete the data row, but a Delete Mark operation, which marks the Delete_Bit on the record without deleting the record. It is a kind of "false deletion"; it is just a mark, and the real deletion work needs to be completed by the background purge thread.

3. Summary

image-20230327195436357

  • Redo log is a physical log, records the physical changes of the data page, undo log is not the reverse process of redo log.
  • undo log is a logical log, when the transaction is rolled back, it just logically restores the database to its original state .

reference

https://www.bilibili.com/video/BV1iq4y1u7vj/?spm_id_from=333.337.search-card.all.click&vd_source=25b05e9bd8b4bdac16ca2f47bbeb7990

Guess you like

Origin blog.csdn.net/qq_42130468/article/details/130025918