How does MySQL ensure the reliability of data (to ensure that data is not lost)

1. Conclusion:

As long as the redo log and binlog are guaranteed to be persisted to the disk, the data can be restored after an abnormal restart of MySQL.

2. Mechanism

WAL mechanism, ( Write Ahead Log ): Transactions are first written to the log and then persisted to disk.

3. binlog writing mechanism

Binlog write flow chart, selected from "MySQL45 Lecture"

process

  • Each thread has a binlog cache, records are first written to the binlog cache, and all threads share a binlog file
  • binlog cache write into binlog file, binlog file is stored in the page cache of the file operating system.
  • The binlog file is persisted to disk through fsync .

explain

  • Write is an operation between memories, and the speed is very fast.
  • fsync is an operation between the memory and the disk, which is slow and occupies the IOPS of the disk.

write control policy

The timing of write and fsync is adjustable, and the parameter sync_binlog can be controlled

  1. sync_binlog = 0, only write each time a transaction is submitted, no fsync.
  2. sync_binlog = 1, fsync will be executed every time a transaction is committed.
  3. sync_binlog= N, write every time a transaction is submitted, but fsync only after accumulating N transactions, and the value range of N is (100, 1000).
    Popular understanding , sync_binlog controls the timing of fsync, for data recovery and efficiency, generally does not take 0 and 1,

4. Redo log writing mechanism

Three states of redo log

process

  1. The redo log is first written into the redo log buffer and stored in the mysql process. (Memory)
  2. Write (write) page cache, stored in the page cache of the file system. (Memory)
  3. Persist (fsync) to disk.

write control policy

The writing control of redo log is also adjusted by parameters: innodb_flush_log_at_trx_commit
can be seen from the parameter name, it is the flushing strategy of redo_log provided by innodb when the transaction is committed

  1. Set to 0 to leave the redo log in the redo log buffer every time the transaction is committed.
  2. Setting it to 1 means that the redo log will be persisted directly to disk every time a transaction is committed.
  3. Setting it to 2 means that the redo log is only written to the page cache each time the transaction is committed.

In addition, there is a background thread in Innodb. Every second, the log in the redo lo buffer is flushed to the page cache, and then persisted to the disk.

5. Two-phase commit mechanism

two-phase commit
MySQL generally adopts a double "1" strategy , that is, both sync_binlog and innodb_flush_log_at_trx_commit are 1.
In other words, a complete transaction commit needs to wait for two flushes, one is in redo log (prepare) fsync, and the other is fsync in writing binlog.
A new problem arises:
if MySQL’s TPS is 20,000 per second, according to the two-phase commit, there will be 40,000 disk writes per second, but the
disk capacity is only 20,000 per second, how to achieve 20,000 TPS?
In other words: in When encountering a disk bottleneck, how to optimize and reduce the number of brushes

Group commit mechanism (group commit)

LSN

Before introducing group submission, you need to understand the log sequence number (log sequence number, LSN ), which is a monotonically increasing and corresponds to the writing point of the redo log . Every time a redo log with a length of length is written, the value of LSN is length will be added .
This paragraph is more difficult to understand, you can look at the picture to understand.
Log Logical Sequence Number

Example of redo log using group commits

The picture comes from "MySQL45 Lecture"
insert image description here

  1. trx1 is the first to arrive and will be selected as the leader of this group;
  2. When trx1 is about to start writing to disk, there are already three transactions in this group, and the LSN also becomes 160 at this time;
  3. When trx1 writes to disk, it carries LSN=160, so when trx1 returns, all redo logs with LSN less than or equal to 160 have been persisted to disk;
  4. At this time, trx2 and trx3 can return directly.

Summary : In a group submission, the more members in the group, the better the effect of saving disk IOPS.
In a concurrent scenario, in order to include as many members as possible in a group commit, after the first transaction writes the redo log buffer, the next fsync needs to be called as late as possible.

optimization

There is such an optimization in MySQL: In order to allow more team members to be brought by one fsync, delay time is adopted.
Divide redo log prepare into two phases

  • write: write redolog cache to page cache
  • fsync:: Persist the redo log in the page cache to disk.
    Divide binlog into two phases:
  • write : Write the log from the binlog cache to the binlog file in the page cache
  • fsync: Persist binlog files to disk.

Two-phase commit refinement
Analyze based on the principle of delaying time
: drag the fsync in the prepare phase to the write of the binlog,
and drag the fsync of the binlog to the fsync of the redo log.
The optimized solution: both the redo log and the bin log implement group submission.
The difference is that the optimization effect brought by the group submission of binlog is not as good as that of redo log. The main reason is that the delay time is not long.
The original words of the boss:
insert image description here
But it can be controlled by parameters:

  1. binlog_group_commit_sync_delay ; Indicates how many microseconds are delayed before calling fsync;
  2. binlog_group_commit_sync_no_delay_count , indicating how many times to call fsync after accumulation.
    The relationship between the two conditions is or, if one is met, fsync will be called.

Guess you like

Origin blog.csdn.net/YuannaY/article/details/131209498