[MySQL study notes (16)] super detailed explanation of the redo log

This article is published by the official account [Developing Pigeon]! Welcome to follow! ! !

Old Rules-Sister Town House:

One. redo log

(I. Overview

When a transaction is committed, only the page is modified in the Buffer Pool of the memory. If a failure occurs at this time, causing the data in the memory to become invalid, then the committed transaction is lost, which is beyond our tolerance. A simple method is to flush all pages modified by the transaction to disk before the transaction is submitted, but this is a waste of resources, because only a byte in a page may be modified, and random I/O Refresh is also very slow, because one transaction may correspond to multiple statements, and one statement may also modify multiple pages, which will cause random I/O to occur.

The redo log is to solve such a problem. Our purpose is to make the changes made to the data in the database by the committed transaction take effect permanently, even if the system crashes, the modification can be restored after restarting. Therefore, it is only necessary to record the content of the page modified in the memory by the transaction in the redo log every time the transaction is submitted, which meets the requirement of durability. This kind of log is called redo log, because it is used for system crashes to restart and update the database.

(2) Advantages of redo logs

Its advantage is that it occupies a very small space, only storing the table space ID, page number, offset and the value that you want to update; at the same time, the redo log is written to the disk sequentially. When executing a transaction, every time a statement is executed, it is possible Several redo logs are generated and written to the disk in the order in which they are generated, that is, sequential I/O.

(3) Redo log format

1. General format

There are many types of redo logs in InnoDB. Most redorizhi have a common structure, storing the log type, tablespace ID, page number, and the specific content of this redo log.

2. Simple redo log

The simple log type only needs to record the value of a few bytes modified at an offset of a certain page, and what is the specific modified content. This simple redo log is called a physical log and is based on the page The amount of data written in is divided into many types.

3. Complex redo logs

Sometimes, when a statement is executed, many pages are modified, including the system data page and the user data page. For example, an INSERT statement may update as many B+ trees as there are indexes in the table. For a certain B+ tree, it may update the leaf node page or the inner node page.

（四） Mini-Transaction

1. Write to the redo log in the form of a group

The redo log generated during the execution of the statement is divided into several indivisible groups: for example, the redo log generated when a record is inserted into the page of the B+ tree corresponding to the clustered index is a group and cannot be divided; The redo log generated by inserting a record into the page of the B+ tree corresponding to the secondary index is also indivisible. Because when inserting a record, locate the corresponding leaf node page, this page may have enough space for insertion (optimistic insertion), or there may not be enough space (pessimistic insertion), for pessimistic insertion, new Apply for data pages, modify statistical information, etc. These operations must be atomic and cannot be disconnected, so the redo log is recorded in the form of a group.

How to divide these redo logs into a group? A special type of redo log is added after the last redo log of the reorganization. Only one type field represents the end of a group of logs. When the system crashes and restarts and resolves to this type of log, you will know that a complete set of redo logs have been parsed, otherwise the redo logs that have been parsed before are discarded. Some operations that only generate a redo log directly use the 1-byte type field in the redo log, using the first bit to represent whether it is a single log, and the following 7 bits to represent the type of redo log.

2. Mini-transaction

The process of making an atomic visit to the underlying page is called a Mini-transaction (MTR). A transaction can contain multiple statements, and a statement can contain multiple MTRs. Each MTR can contain a set of redo logs.

two. Redo log writing process

(1) The page where the redo log is stored

In order to manage the redo log, InnoDB puts the redo log generated by MTR in a page with a size of 512 bytes. The structure of the page is: log block header, log block body, and log block trailer. The body stores the real redo logs, and the other stores are management information.

Attribute explanation in header:

no: 每个页的编号；
data_len: 表示页中已经使用了多少字节，初始值为12字节，填满时为512；
checkpoint_no: checkpoint的序号；

(Two) redo log buffer

Similar to the Buffer Pool, the redo log cannot be written directly to the disk. When the server starts, it applies for a continuous memory space called redo log buffer, that is, log buffer, which is divided into multiple continuous redo log pages. , The default is 16MB.

(3) Write redo log to log buffer

Writing redo logs to the log buffer is written sequentially, and a global variable buf_free is used to indicate where in the log buffer the redo logs to be written subsequently should be written. It does not mean that every redo log is generated and inserted into the log buffer, but the log generated during the operation of each MTR is temporarily stored in one place. After the MTR is over, a set of redo logs generated in the process are all Copy to log buffer. Different transactions may be executed concurrently, so the redo logs corresponding to the MTR of different transactions may be written alternately in the log buffer.

Third, the redo log file

(1) When to flush redo logs

The redo logs in the log buffer will be flushed to disk in some cases.

1. Insufficient log buffer space

When it reaches 50% of the capacity, it will be flushed to the disk.

2. When the transaction is committed

When the transaction is committed, you must be familiar with the redo log corresponding to the page modification to the disk to maintain persistence.

3. When the server is shut down normally

4. When doing checkpoint

(Two) redo log file group

There are two ib_logfile0 and ib_logfile1 files in the MySQL data directory by default, and the logs in the log buffer are flushed to these two disk files by default. Of course, you can modify the specified disk file in the startup options. The log refresh is first refreshed to the first file. If it is full, the next file is selected for writing. If the last one is full, it returns to the first file for writing. Isn't this problem coming? Going back to the first file to write will not overwrite the previous redo log. Here, checkpoint is proposed to solve this problem.

(3) Redo log file format

The essence of flushing the redo log in the log buffer to the disk is to write the mirror image of the redo log page to the disk file, so the redo log disk file is actually composed of several 512-byte pages. In the redo log file group, each file has the same size and the same format. The first 4 pages store management information, and the following pages store the redo log pages.

三． log sequence number(lsn)

(I. Overview

InnoDB involves the global variable of lsn, which is used to record the amount of redo logs currently written to the log buffer. The initial value is 8704. When counting the growth of lsn, it is calculated based on the amount of logs actually written plus the occupied log block header and log block trailer. That is, redo logs are written to the log buffer according to one MTR, which may be used more than once. For each page, the occupied header and trailer should be regarded as the increase of lsn. Each set of redo logs generated by MTR has a unique lsn value corresponding to it. The smaller the value, the earlier it is generated.

（二） flushed_to_disk_lsn

The same goes to global variables. Flushed_to_disk_lsn is used to mark which logs in the current log buffer have been flushed to the disk. lsn represents the redo logs written to the log buffer in the current system but have not been flushed to the disk. If both The same means that all redo logs in the log buffer have been flushed to the disk.

(Three) lsn in the flush list

After the MTR is over, the pages modified during the execution of the MTR need to be added to the flush list of the Buffer Pool. When a page that has been loaded into the Buffer Pool is modified for the first time, the corresponding control of the page will be controlled. The block is inserted into the head of the flush linked list, and when the page is modified later, since it is already in the flush linked list, it will not be inserted again. That is, all dirty pages are stored in the flush linked list, that is, modified pages that have not been updated to the database. The dirty pages are sorted according to the first modification time of the page, and because each insertion is a header interpolation, the previous dirty pages The modification time is late, and the later ones are early.

Two attributes about when the page is modified are recorded in the control block:

oldest_modification: When a buffer page in the Buffer Pool is modified for the first time, the corresponding lsn value at the beginning of the MTR modification of the page is written into this attribute;

newest_modification: Every time a page is modified, the corresponding lsn value at the end of the MTR modification of the page will be written into this attribute.

four. checkpoint

(I. Overview

The capacity of the redo log file group is limited. We need to recycle the files in the redo log file group. This requires determining whether the disk space occupied by some redo logs can be overwritten, that is, whether its corresponding dirty pages have been flushed to the disk. in. The global variable checkpoint_lsn is used to indicate the total amount of redo logs that can be overwritten in the current system. The initial value is 8704. If a page is flushed to the disk, its corresponding redo log is useless and can be overwritten. Then checkpoint_lsn+1, this process is called performing a checkpoint operation.

(2) Steps of checkpoint operation

1. Calculate the maximum lsn value corresponding to the redo log that can be overwritten in the current system?

As long as the oldest_modification value corresponding to the earliest modified dirty page in the current system is calculated, any redo log generated by the system when the lsn value is less than the oldest_modification value of the node can be overwritten, and the oldest_modification of the dirty page is assigned to checkpoint_lsn. That is, the space smaller than checkpoint_lsn can be overwritten.

2. Write the offset of checkpoint_lsn to the corresponding redo log file group and the number of this checkpoint to the management information of the log file.

(2) Brush dirty pages

Generally, background threads perform dirty operations on the LRU linked list and flush linked list. If the lsn value grows too fast, the user thread needs to flush the earliest modified dirty pages from the flush linked list to the disk synchronously, so that the checkpoint can be executed. .