MySQL Knowledge Learning 03 (Detailed explanation of the three major logs binlog, redo log, undo log)

foreword

MySQLLogs mainly include error logs, query logs, slow query logs, transaction logs, and binary logs. Among them, the more important ones belong 二进制日志 binlog(归档日志)to 事务日志 redo log(重做日志)Hehe undo log(回滚日志).

insert image description here
1、redo log?

redo log(Redo log) is InnoDBunique to the storage engine, which enables MySQLcrash recovery.

For example, if the MySQL instance hangs or goes down, when restarting, the InnoDB storage engine will use the redo log to restore the data to ensure the persistence and integrity of the data.

insert image description here

MySQLThe data in the database is in units of pages. When you query a record, a page of data will be loaded from the hard disk. The loaded data is called a data page, and it will be placed in Buffer Pool.

Subsequent queries are all searched Buffer Poolfrom , and then go to the hard disk to load if there is no hit, reducing the IO overhead of the hard disk and improving performance .

The same is true when updating table data. If you find that Buffer Poolthere is data to be updated in , you can directly Buffer Poolupdate it in .

Then it will record "what modification was made on a certain data page" into the redo log cache ( redo log buffer), and then flush the disk to the redo log file .

insert image description here

Ideally, the flushing operation will be performed as soon as the transaction is committed, but in fact, the timing of the flushing is performed according to the strategy.

Tips: Each redo record consists of "table space number + data page number + offset + modified data length + specific modified data"

2. Timing of brushing

InnoDBThe storage engine redo logprovides innodb_flush_log_at_trx_commitparameters for the disk flushing strategy, which supports three strategies:

  • 0: When it is set to 0, it means that the disk operation will not be performed each time the transaction is committed
  • 1 : When set to 1, it means that every time a transaction is committed, a disk operation will be performed (default value)
  • 2: When it is set to 2, it means that only redo log bufferthe content will be written every time the transaction is committedpage cache

innodb_flush_log_at_trx_commitThe parameter defaults to 1 , which means that when the transaction is committed, it will be called fsyncto redo logrefresh the disk

In addition, InnoDBthe storage engine has a background thread that writes the content in to the file system cache ( ) every 1seconds , and then calls flush.redo log bufferpage cachefsync

insert image description here

In other words, a record that has not committed a transaction redo logmay also be flushed.

why?

Because redo logthe records will be written in the transaction execution process redo log buffer, these redo logrecords will be flushed by the background thread.

insert image description here

In addition to the polling operation of the background thread once per 1second, there is another situation where the background thread will actively refresh the disk when redo log bufferthe occupied space is about to be reached .innodb_log_buffer_size 一半

The following is a flow chart of different brushing strategies.

innodb_flush_log_at_trx_commit=0

insert image description here

For the time being0 , if MySQL hangs or goes down, there may be 1a second data loss.

innodb_flush_log_at_trx_commit=1

insert image description here

For the time being1 , as long as the transaction is submitted successfully, redo logthe record must be stored in the hard disk, and there will be no data loss.

If MySQL hangs or goes down during transaction execution, this part of the log is lost, but the transaction is not committed, so there will be no loss if the log is lost.

innodb_flush_log_at_trx_commit=2

insert image description here

At the end2 , as long as the transaction commits successfully, redo log bufferthe content in is only written to the file system cache ( page cache).

If you just MySQLhang up, there will be no data loss, but there may be 1a second data loss if the system goes down.

3. Log file group?

There is not only one log file stored on the hard disk redo log, but in the form of a log file group, and the size of each redolog file is the same.

For example, it can be configured as a group of 4files, and the size of each file is the content that 1GBthe entire redo loglog file group can record 4G.

It adopts the form of a circular array, starting from the beginning to write, writing to the end and returning to the beginning to write in a loop, as shown in the figure below.

insert image description here

There are two important attributes in a log file group, namely write pos,checkpoint

  • write pos is the position of the current record, move backward while writing
  • checkpoint is the current position to be erased, and it is also moved backwards

Every time the disk is swiped redo log and recorded in the log file group , write posthe position will be moved backwards and updated.

Every time MySQL loads the log file group to restore data, it will clear the loaded redo logrecords and checkpointupdate them backwards.

write poscheckpointThe empty part between and can be used to write new redo logrecords.

insert image description here

If write posit catches up checkpoint, it means that the log file group is full, and no new redo log records can be written at this time. MySQL has to stop, clear some records, and advance the checkpoint.

insert image description here

4. Redo log summary?

Now let's think about a question: as long as the modified data page is directly flashed every time, it will be fine. What's the matter with the redo log?

Aren't they all brushes? Where is the difference?

1 Byte = 8bit
1 KB = 1024 Byte
1 MB = 1024 KB
1 GB = 1024 MB
1 TB = 1024 GB

In fact, the size of the data page is 100% 16KB. It is time-consuming to flash the disk. It may be possible to modify a few bytes of data in the data page. Is it necessary to flash the complete data page?

Moreover, data page flashing is random writing, because the location corresponding to a data page may be in a random location of the hard disk file, so the performance is very poor.

If it is writing redo log, one line of records 可能就占几十 Byteonly includes the table space number, data page number, disk file offset, update value, plus sequential writing, 所以刷盘速度很快.

Therefore, using the redo log to record the modified content will far outperform the way of refreshing the data page, which also makes the database more concurrency.

In fact, the data pages of the memory will also be flashed at a certain time . We call this 页合并, and we will elaborate on this when we talk about Buffer Pool.

5、binlog?

redo logIt is a physical log , and the record content is " what modifications have been made on a certain data page ", which belongs to InnoDB 存储引擎.

It binlogis a logical log , and the record content is the original logic of the statement , which is similar to "add 1 to the c field of the line ID=2", which belongs to MySQL Serverthe layer.

Regardless of the storage engine used, as long as table data updates occur, binloglogs will be generated .

So what is the binlog used for?

It can be said that the data backup, master-standby, master-master, and master-slaveMySQL of the database are all inseparable , and they need to be relied on to synchronize data and ensure data consistency .binlogbinlog

insert image description here

Binlog will record all logical operations involving updating data, and it is written sequentially.

6. Record format

binlogThere are three formats for logs, which can binlog_formatbe specified by parameters.

  • statement
  • row
  • mixed

Specifies statementthat the content of the record is the original text of the SQL statement . For example, if one is executed update T set update_time=now() where id=1, the content of the record is as follows.

insert image description here

When synchronizing data, the recorded SQLstatement will be executed, but there is a problem that update_time=now()the current system time will be obtained here, and direct execution will result in inconsistency with the data in the original database.

In order to solve this problem, we need to specify that rowthe content of the record is no longer a simple SQL statement, but also contains the specific data of the operation. The content of the record is as follows.

insert image description here

rowThe content recorded in the format cannot see the detailed information, and it needs to mysqlbinlogbe parsed by tools.

update_time=now()It becomes a specific time update_time=1627112756247. @1, @2, and @3 behind the condition are the original values ​​of the first to third fields of the data in the row (assuming this table has only 3 fields).

In this way, the consistency of the synchronized data can be guaranteed, which is usually designated as row, which can bring better reliability to the recovery and synchronization of the database.

However, this format requires a larger capacity for recording, which takes up more space, and consumes more IO resources during recovery and synchronization, which affects execution speed.

So there is a compromise solution, specified as mixed, the recorded content is a mixture of the former two.

MySQL will judge whether this SQL statement may cause data inconsistency , if 是,就用row格式, 否则就用statement格式.

7. Writing mechanism

binlogThe timing of writing is also very simple. During the execution of the transaction, the log is written first binlog cache, and when the transaction is committed, it is binlog cachewritten binlogto the file.

Because a transaction binlogcannot be disassembled, no matter how large the transaction is, it must be written once, so the system will allocate a block of memory to each thread as the binlog cache .

We can binlog_cache_sizecontrol binlog cachethe size of a single thread through parameters. If the storage content exceeds this parameter, it must be temporarily stored to disk ( Swap).

The binlog log flushing process is as follows

insert image description here

  • The write in the above figure refers to writing the log to the page cache of the file system, and does not persist the data to the disk, so the speed is relatively fast
  • The fsync in the above figure is the operation of persisting data to disk

writefsyncThe timing of the sum can be controlled by parameters sync_binlog, and the default is 0.

When it is 0, it means that each time a transaction is submitted, it only writes, and the system judges when to execute fsync.

insert image description here

Although the performance is improved, but the machine is down, the binlog in the page cache will be lost.

For the sake of safety, it can be set to 1indicate that fsync will be executed every time a transaction is committed, just like the redo log log flushing process.

Finally, there is a compromise method, which can be set to N (N>1), which means that every time a transaction is submitted, it will be written, but it will only be fsynced after accumulating N transactions .

insert image description here

In scenarios where IO bottlenecks occur, setting sync_binlog to a relatively large value can improve performance.

Similarly, if the machine goes down, the binlog logs of the last N transactions will be lost.

8. Two-phase commit

  • redo log(Redo log) Let InnoDBthe storage engine have crash recovery capabilities.
  • binlog(Archive logs) guarantee MySQLthe data consistency of the cluster architecture.

Although they all belong to the guarantee of persistence, they have different emphases.

In the process of executing the update statement, two logs will be recorded redo log, binlogbased on the basic transaction, which redo logcan be continuously written during the execution of the transaction , and binlogonly written when the transaction is committed , so the timing of writing to the redo log and binlog is different. Same.

insert image description here

Back to the topic, redo logif binlogthe logic between the two logs is inconsistent, what will happen?

Let's take updatethe statement as an example, assuming that id=2the value of field c in the record is 0, update the value of field c to 1, and the SQL statement is update T set c=1 where id=2.

Assuming that after the redo log is written during the execution process, an exception occurs during the writing of the binlog log, what will happen?

insert image description here

Because the binlog is abnormal before it is finished, there is no corresponding modification record in the binlog at this time. Therefore, when the binlog log is used to restore data later, this update will be omitted, and the c value of the restored row is 0, while the original database is restored due to the redo log log, and the c value of this row is 1, and the final data is inconsistent.

insert image description here

In order to solve the problem of logical consistency between two logs, the InnoDB storage engine uses a two-phase commit scheme .

The principle is very simple, redo logsplitting the write into two steps preparesumcommit , this is the two-phase commit.

insert image description here

After using two-phase commit , it will not affect the exception when writing to the binlog, because when MySQL restores data based on the redo log log, it finds that the redo log is still in the prepare stage and there is no corresponding binlog log, so the transaction will be rolled back .

insert image description here

Let’s look at another scenario. If an exception occurs in the redo log setting commit phase, will the transaction be rolled back?

insert image description here

It will not roll back the transaction, it will execute the logic framed in the above figure, although the redo log is in the prepare stage, but the corresponding binlog log can be found through the transaction id, so MySQL considers it complete, and will submit the transaction to restore the data.

9、undo log

We know that if we want to ensure the atomicity of transactions, we need to roll back the operations that have been performed when an exception occurs .

In MySQL, the recovery mechanism is implemented through the rollback log (undo log) , and all modifications made by transactions will be recorded in the rollback log first, and then related operations are performed. If an exception is encountered during execution, we 回滚can directly use the information in the log to roll back the data to the state before the modification!

Moreover, the rollback log will be persisted to disk before the data . This ensures that even if the database suddenly goes down, when the user starts the database again, the database can still roll back unfinished transactions by querying the rollback log.

In addition, the implementation of MVCC depends on: hidden fields, Read View, and undo log . In the internal implementation, InnoDB judges the visibility of the data through DB_TRX_IDthe and of the data row. If it is not visible, it finds the historical version in the undo log through the DB_ROLL_PTR of the data row.Read View

The data version read by each transaction may be different. In the same transaction, the user can only see the modification that has been submitted before the transaction creates the Read View and the modification made by the transaction itself

10. Summary

The MySQL InnoDBengine uses to redo log(重做日志)guarantee transactions 持久性, and uses undo log(回滚日志)to guarantee transactions 原子性.

The MySQL database 数据备份、主备、主主、主从is inseparable binlogand needs to rely on binlog to synchronize data.保证数据一致性。

Guess you like

Origin blog.csdn.net/ldy007714/article/details/130486336
Recommended