Redo log in mysql

foreword

Redolog guarantees persistence

We all know that one of the four major characteristics of transactions is persistence . Specifically, as long as the transaction is successfully committed, the changes made to the database will be permanently saved, and it is impossible to return to the original state for any reason .
So how does mysql ensure consistency? The easiest way is to flush all the data pages involved in the transaction to the disk every time the transaction is committed. But doing so will cause serious performance problems, which are mainly reflected in two aspects:
because Innodb interacts with the disk in units of pages, and a transaction may only modify a few bytes in a data page, at this time it will be completely It would be a waste of resources to flush the data pages to the disk!
A transaction may involve modifying multiple data pages, and these data pages are not physically continuous, and the performance of using random IO to write is too poor!
Therefore, mysql designed the redo log. Specifically, it only records the changes made to the data page by the transaction, so that the performance problem can be perfectly solved (relatively speaking, the file is smaller and it is sequential IO).

Redolog work process

The redo log (redo log) is unique to the InnoDB storage engine , which allows MySQL to have crash recovery capabilities.
For example, if the MySQL instance hangs or goes down, when restarting, the InnoDB storage engine will use the redo log to restore the data to ensure the persistence and integrity of the data.
insert image description here
The data in MySQL is based on pages. When you query a record, a page of data will be loaded from the hard disk. The loaded data is called a data page and will be put into the Buffer Pool.
Subsequent queries are all searched from the Buffer Pool first, and then uploaded to the hard disk if there is no hit, reducing the IO overhead of the hard disk and improving performance.
To update data, first find the data to be updated from the Buffer Pool, and then update it directly in the Buffer Pool; if no data to be updated is found, load it from the hard disk to the Buffer Pool.
Then it will record "what modification was made on a certain data page" in the redo log buffer (redo log buffer), and then flush the disk to the redo log file.
insert image description here
Ideally, the flushing operation will be performed as soon as the transaction is committed, but in fact, the timing of the flushing is performed according to the strategy.

Each redo record consists of "table space number + data page number + offset + modified data length + specific modified data"

WAL in redo log (write log first, then write to disk [write to redo log file])

The redo log consists of two parts: one is the log buffer ( ) in memory redo log buffer, and the other is the log file ( ) on disk redo log file. Every time mysql executes a DML statement, it first writes the record redo log buffer , and then writes multiple operation records at a time at a later point in time redo log file. This technology of writing logs first and then writing to disk is the WAL (Write-Ahead Logging) technology often mentioned in MySQL.

In the computer operating system, the buffer data in the user space (user space) generally cannot be directly written to the disk, and must pass through the operating system kernel space (kernel space) buffer ( ) OS Buffer. Therefore, redo log bufferwriting redo log file is actually written first OS Buffer, and then fsync()flushed to it through the system call redo log file, the process is as follows:
insert image description here

The important parameter of flush strategy innodb_flush_log_at_trx_commit

mysql supports three timings for writing redo log buffer to redo log file, which can be innodb_flush_log_at_trx_commit configured through parameters. The meaning of each parameter value is as follows:

innodb_flush_log_at_trx_commit=0 (延迟写)When the transaction is committed, the log in the redo log buffer will not be written to the os buffer, but there is a thread in the background of the InnoDB storage engine, which will write the contents of the redo log buffer to the os buffer every 1 second. And call fsync() to write to the redo log file. That is to say, when it is set to 0, it is (approximately) refreshed and written to the disk every second. When the system crashes, 1 second of data will be lost.
insert image description here

innodb_flush_log_at_trx_commit=1(实时写,实时刷)It means that the log in the redo log buffer will be written to the os buffer every time the transaction is committed, and fsync() will be called to flush it to the redo log file. As long as the transaction commits successfully, the redo log must be on the disk. In this way, no data will be lost even if the system crashes, but because each submission is written to disk, the IO performance is poor.
innodb_flush_log_at_trx_commit=2(实时写,延迟刷) It means that when the transaction is committed, it is only written to the os buffer, and then fsync() is called every second to write the log in the os buffer to the redo log file.
insert image description here

Summary:
0 means that the "log buffer" is synchronized to the "os buffer" and flushed from the "os buffer" to the disk log file every second.
1 means that the "log buffer" is synchronized to the "os buffer" and flushed from the "os buffer" to the disk log file for each transaction commit.
2 means that the "log buffer" is synchronized to the "os buffer" for each transaction commit, but it is flushed from the "os buffer" to the disk log file every second.

how to choose

Sensitive to data loss, set to 1 to ensure writing to disk. But the performance is poor.
Less sensitive to data, set to 0 or 2, better performance. But 1 second of data may be lost.

Redo log file log file

In the MySQL default directory, there are two files ib_logfile0 and ib_logfile1. Every time the disk is swiped, the data is refreshed into these two files. There are
many Redo log files, which generally appear in the form of log file groups. The files are named uniformly, and the format is ib_logfile+number, starting from 0
Each file in the log file group has the same size
Each write starts from 0, then 1, 2, 3...
insert image description here

log file group

There is not only one redo log file stored on the hard disk, but a log file group, and each redo log file has the same size.
For example, it can be configured as a group of 4 files, the size of each file is 1GB, and the entire redo log log file group can record the content of 4G.
It adopts the form of a circular array, starting from the beginning to write, writing to the end and returning to the beginning to write in a loop, as shown in the figure below.
insert image description here

Redo log flushing and data page flushing

At the same time, we can easily know that in InnoDB, both the redo log needs to be flushed, and the data page also needs to be flushed. The significance of the existence of the redo log is mainly to reduce the requirements for data page flushing.
In the figure above, write posit represents the LSN (logical sequence number) position of the current record of the redo log, and check pointrepresents the LSN (logical sequence number) position of the corresponding redo log after the data page change record is flushed.
write posThe part between to check point is the empty part of the redo log, which is used to record new records; the check pointpart write posbetween to is the change record of the data page to be placed in the redo log. When write poscatching up check point, it will first push the check point forward to make room for a new log.

When starting innodb, regardless of whether it was shut down normally or abnormally last time, a recovery operation will always be performed. Because the redo log records the physical changes of the data page, the recovery speed is much faster than the logical log (such as binlog). When restarting innodb, it will first check the LSN of the data page in the disk, and if the LSN of the data page is smaller than the LSN in the log, it will start to recover from the checkpoint. In another case, before the downtime, the checkpoint is in the disk flushing process, and the disk flushing progress of the data page exceeds the disk flushing progress of the log page. At this time, the LSN recorded in the data page is greater than the LSN in the log. The part that exceeds the progress of the log will not be redone, because this itself means that something has already been done and there is no need to redo it.

When is the redo log flushed to disk?

When shutting down the server gracefully

When the transaction is committed (innodb_flush_log_at_trx_commit = 1)

When a transaction is committed, all logs in the log buffer are flushed to disk to ensure durability . Note that at this time, in addition to this transaction, the logs of other transactions may also be flushed.

background thread input ( innodb_flush_log_at_trx_commit = 0 or innodb_flush_log_at_trx_commit = 2 )

innodb_flush_log_at_trx_commit = 0: write os buffer every second and call fsync() to write to the redo log file
innodb_flush_log_at_trx_commit = 2: call fsync() every second to write the log in the os buffer to the redo log file

When redo log buffer space is insufficient

The size of the redo log buffer is limited. If you keep adding logs to this limited size log buffer, it will be filled soon. If the amount of redo logs currently written to the log buffer has occupied about half of the total capacity of the log buffer, these logs need to be flushed to disk.

Trigger checkpoint rules

The redo log cache and redo log files are stored in blocks, which are called redo log blocks, and the block size is fixed at 512 bytes. Our redo log has a fixed size and can be regarded as a logical log group consisting of a certain number of log blocks.
Its writing method is to start writing from the beginning to the end, write to the end and return to the beginning to write in a loop.
There are two mark positions:
write pos is the position of the current record, it moves backwards while writing, and returns to the beginning of file 0 after writing to the end of file No. 3.
The checkpoint is the current position to be erased, and it is also moved backwards and cyclically. Before erasing the record, the record must be updated to the disk.
insert image description here
When write_pos catches up with checkpoint, it means that the redo log is full. At this time, you can no longer write data into it, and you need to execute checkpoint rules to free up writable space.
The so-called checkpoint rule means that after the checkpoint is triggered, all the log pages in the buffer are flushed to disk.

Execution steps of an update statement

insert image description here

  1. The executor first looks for the engine to fetch the line with ID=2. ID is the primary key, and the engine directly uses tree search to find this row. If the data page where the row ID=2 is located is already in the memory, it will be returned directly to the executor; otherwise, it needs to be read from the disk into the memory first, and then returned.
  2. The executor gets the row data given by the engine, adds 1 to this value, for example, it used to be N, and now it is N+1 to get a new row of data, and then calls the engine interface to write the new row of data.
  3. The engine updates this row of new data into the memory, and at the same time records the update operation in the redo log, and the redo log is in the prepare state at this time. Then inform the executor that the execution is complete and the transaction can be submitted at any time. [Write to redo log (in prepare stage)]
  4. The executor generates a binlog of this operation and writes the binlog to disk. 【Write binlog】
  5. The executor calls the commit transaction interface of the engine, and the engine changes the newly written redo log to the commit state, and the update is completed. 【Submit transaction (in commit state)】

Mysql's two-phase commit

insert image description here

The above process uses a two-phase commit, so why use a two-phase commit? It is to make the logic between binlog and redo log consistent.

Let's assume that MySQL crashes at every moment when the above update statement is executed, and see how the logic between the two logs is consistent.
Assuming that before step 3, MySQL crashes and restarts, the transaction commit fails and the data will not be affected. Although the memory is updated, after a crash, the memory is lost.
Assuming that it crashes after the completion of step 3, the redo log has been written at this time. After restarting, it is found that the redo log is in the prepare stage, and it will not be restored.
Assuming that it crashes after the completion of step 4, the binlog has been written at this time. After restarting, it is found that the binlog has been written, and the corresponding redo log is changed to the commit state.
In this way, the logical consistency of redo log and binlog can be guaranteed.

Two-phase commit is a commonly used scheme to maintain logical consistency of data across systems.
insert image description here

Why introduce redolog

insert image description here

The amount of modified data is small, and each write is written to the disk, resulting in waste of resources

InnoDB manages the storage space in units of pages (the data page size is 16KB). Any addition, deletion, and modification operations will eventually operate on a complete page, load the entire page into the buffer pool, and then modify the records that need to be modified. Modify, and only modify one record, it is too wasteful to refresh a complete data page to disk

Directly write back to disk, it is random IO, low efficiency

Data page flashing is random writing, because the location corresponding to a data page may be in a random location of the hard disk file, so the performance is very poor.
If you are writing redo log, a row of records may occupy tens of Bytes, including only table space number, data page number, disk file offset, and update value. At the same time, redo log is cyclically written to a fixed file and written to disk sequentially . Yes , so the brushing speed is very fast.

Therefore, using the redo log to record the modified content will far outperform the method of refreshing the data page, which also makes the database more concurrency.

Video link: https://www.bilibili.com/video/BV1Pv411h7Ep
Article source: https://javaguide.cn/database/mysql/mysql-logs.html#redo-log
Log file: https://blog.csdn .net/Merciful_Lion/article/details/124715392
https://blog.csdn.net/weixin_40449300/article/details/117927295
https://www.cnblogs.com/liang24/p/14089065.html
https://blog. csdn.net/weixin_43213517/article/details/117457184
https://www.zhihu.com/question/486105337/answer/2538190061

Guess you like

Origin blog.csdn.net/yzx3105/article/details/130685375