MySQL (InnoDB analysis): ---Checkpoint (checkpoint) technology

I. Introduction

  • As mentioned in the previous article ( https://blog.csdn.net/m0_46405589/article/details/113844781 ), the design purpose of the buffer pool is to coordinate the gap between CPU speed and disk speed. Therefore , the operation of the page is first completed in the buffer pool . If a DML statement, such as Update or Delete, changes the record in the page , then the page is dirty at this time . That is, the version of the page in the buffer pool is newer than that of the disk. The database needs to flush the new version of the page from the buffer pool to disk

The nature of D (persistence) in ACID

  • If every time a page changes, the version of the new page is flushed to disk, then this overhead is very large . If the hot data is concentrated in a few pages, the performance of the database will become very poor. At the same time, if a downtime occurs when flushing the new version of the page from the buffer pool to disk, then the data cannot be recovered
  • In order to avoid the problem of data loss , the current transaction database system generally adopts the Write Ahead Log strategy, that is, when the transaction is committed, the redo log is written first, and then the page is modified . When data is lost due to a downtime, the redo log is used to complete the data recovery . This is the requirement of D in transaction ACID

 

2. Why design Checkpoint technology

  • Consider the following scenario. If the redo log can grow indefinitely and the buffer pool is large enough to buffer all database data , then there is no need to flush the new version of the pages in the buffer pool back to disk. Because when a downtime occurs, it is completely possible to restore the data in the entire database system through the redo log to the moment of the downtime, but two prerequisites are required :
    • 1. The buffer pool can cache all the data in the database
      • For the first prerequisite, experienced users know that when the database is first created, there is no data in the table. The buffer pool can indeed cache all database files. However, with the promotion of the market and the increase of users, the products are getting more and more attention, and the usage is also increasing. At this time, the capacity of the database responsible for back-end storage must continue to increase. Currently 3TB of MySQL is not uncommon, but 3TB of memory is very rare
    • 2. The redo log can grow infinitely
      • Corresponding to the second prerequisite: the redo log can increase indefinitely. It may be possible, but this requires too much cost and is not convenient for operation and maintenance. The DBA or SA cannot know when the redo log is close to the threshold of the disk space available, and it requires certain skills and equipment support to support the dynamic expansion of the storage device.
  • Even if the above two conditions are met, there is another situation to consider : the recovery time of the database after a downtime. When the database has been running for months or even years, there will be a downtime at this time, and the time to reapply the redo log will be very long, and the recovery code at this time is also very large
  • Therefore, the purpose of Checkpoint technology is to solve the following problems:
    • ① Shorten the recovery time of the database
    • ② When the buffer pool is not enough, flush dirty pages to disk
    • ③When the redo log is unavailable, refresh the dirty page

① Shorten the recovery time of the database

  • When the database is down, the database does not need to redo all the logs, because the pages before Checkpoint have been flushed back to disk
  • Therefore, the database only needs to restore the redo log after Checkpoint. This greatly shortens the recovery time

② When the buffer pool is not enough, flush dirty pages to disk

  • In addition, when the buffer pool is unavailable, the least recently used page will overflow according to the LRU algorithm . If the page is a dirty page, then checkpoint needs to be enforced to flush the dirty page, which is the new version of the page, back to disk

③When the redo log is unavailable, refresh the dirty page

  • The redo log is unavailable because :
    • The current transaction database system is designed to recycle the redo log , not to increase it indefinitely, which is more difficult in terms of cost and management.
    • The reusable part of the redo log means that these redo logs are no longer needed , that is, when the database is down, the database recovery operation does not need this part of the redo log, so this part can be overwritten and reused
  • If the redo log still needs to be used at this time, it must be forced to generate a Checkpoint to flush the pages in the buffer pool to at least the current redo log location

Three, LSN

  • For InnoDB, it uses LSN to mark the version . The LSN is an 8-byte number, and its unit is byte
  • Each page has LSN, redo log also has LSN, checkpoint also has LSN . You can view the information displayed by the following command:
show engine innodb status\G;

Four, Sharp Checkpoint and Fuzzy Checkpoint

  1. In the InnoDB storage engine, the timing and conditions of checkpoint occurrence and the selection of dirty pages are all very complicated . What Checkpoint does is nothing more than flushing dirty pages in the buffer pool back to disk. The difference lies in how many pages are flushed to disk each time, where dirty pages are fetched each time, and when Checkpoint is triggered
  • Inside the InnoDB storage engine, there are two Checkpoints, namely:
    • Sharp Checkpoint
    • Fuzzy Checkpoint

Sharp Checkpoint

  • Sharp Checkpoint flushes all dirty pages back to disk when the database is closed . This is the default working mode, that is, the parameter innodb_fast_shutdown=1

Fuzzy Checkpoint

  • If the database also uses Sharp Checkpoint at runtime, the availability of the database will be greatly affected. Therefore, InnoDB still uses Fuzzy Checkpoint for page refresh, that is, only a part of the dirty pages are refreshed , instead of flushing all the dirty pages back to disk
  • Fuzzy Checkpoint will happen in the following situations :
    • 1.Master Thread Checkpoint
    • 2.FLUSH_LRU_LIST Checkpoint
    • 3.Async/Sync Flush Checkpoint
    • 4.Dirty Page too much Checkpoint

①Master Thread Checkpoint

  • For the Checkpoint that occurs in the Master Thread (described in detail in a later article), a certain percentage of pages are flushed back to disk from the dirty page list of the buffer pool at a rate of almost every second or every ten seconds . This process is asynchronous, that is, the InnoDB storage engine can perform other operations at this time, and the user query thread will not be blocked

FLUSH_LRU_LIST Checkpoint

  • working principle:
    • FLUSH_LRU_LIST Checkpoint is because the InnoDB storage engine needs to ensure that there are almost 100 free pages available for use in the LRU list . If there are not 100 free pages available, then InnoDB will remove the pages at the end of the LRU list . If there are dirty pages in these pages, checkpoint is required
    • And these pages are from the LRU list, so they are called FLUSH_LRU_LIST Checkpoint
  • Before InnoDB 1.1.x version, you need to check whether there is enough free space in the LRU list. The operation occurs in the user query thread, which will obviously block the user's query operation
  • Starting from MySQL 5.6 version, which is InnoDB 1.2.x version, this check is carried out in a separate Page Cleaner thread , and the user can control the number of available pages in the LRU list through the parameter innodb_lru_scan_depth , the value defaults to 1024
show variables like 'innodb_lru_scan_depth'\G;

③Async/Sync Flush Checkpoint

  • Async/Sync Flush Checkpoint refers to the situation where the redo log file is unavailable. At this time, it is necessary to force some pages to be flushed back to disk , and the dirty pages are selected from the dirty page list.
  • If the LSH that has been written to the redo log is redo_lsn, and the LSN that has been flushed back to the latest page of the disk is recorded as checkpoint_lsn, you can define :

  • Then define the following variables :

  • If the size of each redo log file is 1GB, and two redo log files are defined, the total size of the redo log file is 2GB. Then async_water_mark=1.5GB, sync_water_mark=1.8GB, then:
    • When checkpoint_age<async_water_mark : no need to flush any dirty pages to disk
    • When async_water_mark<checkpoint_age<sync_water_mark : trigger Async Flush, flush enough dirty pages from the Flush list back to disk, so that checkpoint_age<async_water_mark is satisfied after flushing
    • The situation of checkpoint_age>sync_water_mark generally rarely occurs, unless the set redo log file is too small, and a BULK INSERT operation similar to LOAD DATA is being performed. At this time, the Sync Flush operation is triggered to flush enough dirty pages from the Flush list back to disk, so that checkpoint_age<async_water_mark is satisfied after flushing
  • It can be seen that Async/Sync Flush Checkpoint is to ensure the availability of redo log recycling
  • Prior to InnoDB 1.2.x , Async Flush Checkpoint would block user query threads that found problems, while Sync Flush Checkpoint would block all user query threads and wait for the dirty page refresh to complete. Starting from InnoDB1.2.x version , this part of the refresh operation is also put into a separate Page Cleaner Thread, so it will not block the user query thread
  • The official version of MySQL cannot check whether the refresh page is checkedpoint from the Flush list or from the LRU list, nor does it know the number of Async/Sync Flush generated due to redo logs, but the InnoSQL version provides a method. You can use the following Command to observe:
show engine innodb status\G;

④Dirty Page too much Checkpoint

  • Working principle : Dirty Page too much, that is, the number of dirty pages is too much, causing the InnoDB storage engine to force Checkpoint. In general, its purpose is to ensure that there are enough free pages in the buffer pool
  • Which may be controlled by the parameter innodb_max_dirty_pages_pct :
    • The value 75 shown above means that when the number of dirty pages in the buffer pool occupies 75%, checkpoint is forced to refresh part of the dirty pages to disk. Before InnoDB 1.0.x version, the default value of this parameter is 90, and the following version is 75
show variables like 'innodb_max_dirty_pages_pct'\G;

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/m0_46405589/article/details/113861431