MySQL buffer pool Buffer Pool detailed explanation

In the application system, in order to speed up data access, we will put high-frequency data in the "cache" (Redis, MongoDB) to reduce the pressure on the database.

In the operating system, in order to reduce disk IO, the "buffer pool" mechanism is introduced.

As a storage system, MySQL also has a buffer pool mechanism to improve performance and reduce disk IO. The structure diagram is as follows:

img

"The above structure diagram shows that Buffer Pool is one of the four major components of the InnoDB memory structure. It does not belong to the Server layer of MySQL and is the buffer pool of the InnoDB storage engine layer. " Therefore, this is different from the [query cache] function deleted in MySQL 8.0.

1. What is Buffer Pool?

"Buffer Pool is [buffer pool, abbreviated as BP]. BP uses Page pages as units to cache the hottest data pages (data pages) and index pages (index pages). The default size of Page pages is 16K. The bottom layer of BP uses a linked list data structure. Manage Page" .

img

The above figure describes the location of Buffer Pool in innoDB. Through its location, we can roughly know its workflow:

All data page read and write operations need to be performed through the buffer pool.

Innodb read operation, first check whether the data page of the data exists in the buffer_pool. If it does not exist, read the page from the disk to the buffer pool.

Innodb write operations first write data and logs to the buffer pool and log buffer, and then the background thread flushes the contents of the buffer to the disk at a certain frequency. " This flushing mechanism is called Checkpoint . "

The transaction durability of write operations is guaranteed by redo log disk placement, and the buffer pool is only used to improve reading and writing efficiency.

img

"Buffer Pool caches table data and index data, loads data on the disk into the buffer pool, avoids disk IO for each access, and speeds up access. "

  • Buffer Pool is a memory area, a "mechanism to reduce disk access"**.
  • The reading and writing of the database are performed on the buffer pool, which is used together with undo log/redo log/redo log buffer/binlog. The data will be flushed to the hard disk later.
  • The default size of Buffer Pool is 128M, which is used to cache data pages (16KB).
show variables like 'innodb_buffer%';

Buffer Pool is InnoDB's data cache. In addition to caching "index pages" and "data pages", it also includes undo pages, insertion cache, adaptive hash index, lock information, etc.

"Most pages in the buffer pool are data pages (including index pages) . "

"Innodb also has a log buffer to save redo logs . "

2. Buffer Pool control block

The data page is cached in the Buffer Pool. The data page size is the same as the default data page size on the disk (16K). In order to better manage the cache page, the Buffer Pool has a "data describing area":

"InnoDB creates a separate area for each cached data page, recording the metadata information of the data page, including the table space to which the data page belongs, the data page number, the address of the cache page in the Buffer Pool, linked list node information, Some lock information and LSN information, etc., this area is called the control block . "

"There is a one-to-one correspondence between the control block and the cache page. They are both stored in the Buffer Pool. The control block is stored in front of the Buffer Pool, and the cache page is stored in the back of the Buffer Pool. "

The control block accounts for about 5% of the cache page size, about 16 * 1024 * 0.05 = 819 bytes.

img

The above figure shows the corresponding relationship between the control block and the data page. You can see that there is a fragmented space between the control block and the data page.

There may be questions here, why is there fragmented space?

As mentioned above, the data page size is 16KB and the control block is about 800 bytes. After we divide all the control blocks and data pages, there may be remaining space that is not enough for a pair of control blocks and cache pages. This part It's just excess fragmented space. If the size of the Buffer Pool is set just right, fragmentation may not occur.

3. Buffer Pool Management

"There are three linked lists in the Buffer Pool, the LRU linked list, the free linked list, and the flush linked list. InnoDB controls the update and elimination of data pages through the use of these three linked lists. "

3.1 Initialization of Buffer Pool

"When starting the Mysql server, you need to complete the initialization process of the Buffer Pool, that is, allocate the memory space of the Buffer Pool and divide it into several pairs of control blocks and cache pages. "

  • "Apply for space" When the Mysql server starts, it will go to the operating system to "apply for a continuous memory area" as the memory area of ​​the Buffer Pool based on the set Buffer Pool size (innodb_buffer_pool_size). The reason why the memory space applied here will be larger than innodb_buffer_pool_size is mainly because it also stores the control block of each cache page.
  • "Dividing space" After the application for the memory area is completed, the database will be divided into several "control blocks" in the Buffer Pool according to the default cache page size of 16KB and the corresponding control block size of about 800 bytes. Block & Buffer Page] Right"**.

After the space is divided, the cache pages of the Buffer Pool are all empty, with nothing in them. When the data is added, deleted, modified, and checked, the pages corresponding to the data are read out from the disk file and put into the Buffer. In the cache page in the Pool.

3.2 Free linked list

"When the Buffer pool is first initialized, the data pages and control blocks inside are empty." When reading and writing are performed, the data pages of the disk will be loaded into the data pages of the Buffer pool. When there are pages in the middle of the BufferPool After the data is persisted to the hard disk, these data pages will be freed again.

There will be a problem in the above process. How to know which data pages are empty and which ones have data? Only by finding the empty data pages can the data be written. One way is to traverse all the data pages. According to experience, Generally speaking, as long as it is traversed in its entirety, it will definitely be intolerable to a pursuing coder. The developers of innoDB will undoubtedly be even more intolerable, so there is a free linked list.

3.2.1 What is Free linked list?

"The Free linked list is a free linked list. It is a two-way linked list, consisting of a basic node and several sub-nodes. It records the control block information corresponding to the free data page. " as follows

img

  • The role of the Free linked list: Helps find free cache pages

  • "Base node"

  • "It is a separately applied memory space (occupies about 40 bytes). It is not in the continuous memory space of the Buffer Pool . "

  • Contains information such as the address of the head node and the address of the tail node in the child nodes in the linked list, as well as the number of nodes in the current linked list.

  • "child node"

  • "Each node is a control block of a free cache page. That is, as long as a cache page is free, its control block will be placed in the free linked list."

  • There are two pointers in each control block, free_pre (pointing to the previous node) and free_next (pointing to the next node).

The purpose of the Free linked list is to describe the data pages in the Buffer Pool, so there is a one-to-one correspondence between the Free linked list and the data pages, as shown in the following figure:

The picture above shows the corresponding relationship between the Free linked list recording free data pages. There may be a misunderstanding here, thinking that this control block has one copy in the Buffer Pool and one in the Free linked list. It seems that there are two identical controls in the memory. Block, "If you think so, you are totally wrong . "

"Misunderstanding"

"The free linked list itself is actually composed of control blocks in the Buffer Pool. As mentioned earlier, each control block has two pointers free_pre/free_next, which point to the node of its previous free linked list and the next free linked list respectively. node."

"The control blocks in the Buffer Pool can string all the control blocks into a free linked list through two pointers. In order to make the drawing look clearer, I drew a separate copy of the free linked list to represent the pointers between them. Reference relationship."

"Based on this, the real relationship diagram should be as shown below" :

img

The reason why both pictures are drawn here is because the pictures drawn by many blogs on the Internet are similar to the above. "It will give people the misunderstanding that there is a control block in the Buffer Pool and the free linked list ." I I also had such doubts at the beginning, so I will explain and record them here.

3.2.2 The process of loading disk pages into the cache of BufferPool

It only takes three steps to load disk pages into the BufferPool cache through the free linked list:

"step one"

"Retrieve a free control block and corresponding buffer page from the free list . "

img

"Step 2"

"Read the data page on the disk into the corresponding cache page, and at the same time write some related description data into the control block of the cache page (for example: the table space where the page is located, page number and other information). "

img

"Step 3"

"Removing the free linked list node corresponding to the control block from the linked list indicates that the buffer page has been used . "

img

  • The following uses a pseudo code to describe how the control block is removed from the free linked list node. Assume that the structure of the control block is as follows:
/**
 *  控制块
 */
publicclass CommandBlock {
    
    
    /**
     *  控制块id,也就是自己,可以理解为当前控制块的地址,
     */
    private  String blockId;
    /**
     *  Free链表中当前控制块的上一个节点地址
     */
    private  String freePre;
    /**
     *  Free链表中当前控制块的下一个节点地址
     */
    private  String freeNext;
}

Suppose there is a control block n-1, its previous node is the description data block n-2, and its next node is the description data block n, then its data structure is as follows:

/**
 *  控制块 n-1
 */
publicclass CommandBlock {
    
    
    /**
     *  控制块id,也就是自己,可以理解为当前控制块的地址 block_n-1,
     */
    blockId = block_n-1;
    /**
     *  Free链表中当前控制块的上一个节点地址 block_n-2
     */
    freePre = block_n-2;
    /**
     *  Free链表中当前控制块的下一个节点地址 block_n
     */
    freeNext = block_n;
}

In the above figure, we use control block N. To remove it from the free linked list, we only need to set freeNext in block_n-1 to null, and block_n will lose the reference to the linked list.

/**
 *  控制块 n-1
 */
publicclass CommandBlock {
    
    
    /**
     *  控制块id,也就是自己,可以理解为当前控制块的地址 block_n-1,
     */
    blockId = block_n-1;
    /**
     *  Free链表中当前控制块的上一个节点地址 block_n-2
     */
    freePre = block_n-2;
    /**
     *  Free链表中当前控制块的下一个节点地址 block_n
     */
    freeNext = null;
}

3.2.3 How to determine whether a data page is cached

We understand that the disk page is loaded into the cache page of the Buffer Pool through Free. All data cannot be read from the disk and then written into the cache page through the Free linked list. It is possible that the data page already exists in the cache page. So how to determine whether the data page should be cached?

"The database provides a data page cache hash table, with the table space number + data page number as the key, and the address of the cache page control block as the value. "

#注意:value是控制块的地址,不是缓存页地址
{
    
    表空间号+数据页号:控制块的地址}

When using a data page, it will first search in the data page cache hash table. If found, the control block will be directly located based on the value, and then the cache page will be found based on the control block. If not found, the disk data page will be read and written. Cache, and finally write the data page cache hash table.

"In this process, if a statement is to be executed, it will generally go through the following processes" :

  • You can know which table space the data page to be loaded is in through the database name and table name in the sql statement.
  • "According to the table space number, the table name itself obtains the index root node data page number through the consistency algorithm . "
  • Then, based on the root node data page number, the next-level data page is found, and the corresponding cache page address can be obtained from the data page cache hash table.
  • The cache page can be located in the Buffer Pool through the cache page address.

Key misunderstanding! ! ! Key misunderstanding! ! ! Key misunderstanding! ! ! Important things to say three times: The consistent hash algorithm mentioned above "refers to the [page number of the root node, not the data page number of the currently searched data]" in the data dictionary. When we get the root node page After the number, it searches down layer by layer through B+tree. Before finding the next layer, it will go to the buffer pool through the data cache hash table to see if the data page of this layer exists. If it does not exist, it will load it from the disk.

3.3 LRU linked list

Before understanding the LRU linked list, we first consider two issues:

  • The first question: As mentioned earlier, when the data page is read from the disk to the Buffer Pool, the corresponding control block will be removed from the Free linked list. So where is this control block placed after it is removed?
  • Second question: The size of the Buffer Pool is 128MB. After all free data pages in the Buffer Pool are loaded with data, how should the new data be processed?

Both of the above problems require LRU linked lists to solve. Let’s take a look at the LRU linked lists with these two questions.

3.3.1 What is an LRU linked list?

Buffer pool is a cache pool that comes with InnoDB. Data reading and writing are done in the buffer pool, and the data pages in the Buffer pool are operated. However, the size of the Buffer Pool is limited (default 128MB), so for Some frequently accessed data is expected to remain in the Buffer Pool, while some less accessed data is expected to be released to free up the data page to cache other data.

"Based on this, InnoBD adopts the LRU (Least recently used) algorithm, which places frequently accessed data at the head of the linked list, and rarely accessed data at the end of the linked list. When there is not enough space, it will be eliminated from the tail to free up space." .

"The LRU linked list is essentially composed of control blocks . "

3.3.2 Writing process of LRU linked list

"When the database loads a data page from the disk into the Buffer Pool, some change information will also be written to the control block , and the control block will be detached from the Free linked list and added to the LRU linked list . " The process is as follows:

img

Let’s sort out the whole process:

  • "Step 1: Based on the table space number, the table name itself obtains the data page number through the consistency algorithm (the tree search process is omitted here)"
  • "Step 2: Determine whether the data page is loaded through the data page cache hash table"
  • "Step 3: Get a control block from the Free list"
  • "Step 4: Read disk data"
  • "Step 6: Write data to free cache pages"
  • "Step 7: Write the cache page information back to the control block"
  • "Step 8: Remove the return control block from the Free list"
  • "Step 9: Add the control block node removed from Free to the LRU linked list"

3.3.3 Elimination mechanism of LRU linked list

"The design idea of ​​the LRU algorithm is: the nodes at the head of the linked list are the most recently used, and the nodes at the end of the linked list are the ones that have not been used for the longest time. When there is not enough space, the nodes at the end that have not been used for the longest time are eliminated to free up space. ” .

"The purpose of the LRU algorithm is to allow the accessed cache page to be ranked as high as possible . "

  • "Design ideas of LRU algorithm"
  • When the accessed page is in the Buffer Pool, the control block corresponding to the page is moved to the head node of the LRU linked list.
  • When the accessed page is not in the Buffer Pool, in addition to putting the control block into the head of the LRU linked list, the node at the end of the LRU linked list must also be eliminated.
  • "The implementation process of LRU"

img

  • There is a data access, and data page 23 is accessed. Data page 23 is not in the Buffer Pool, so after the disk is loaded, page 22 at the end will be eliminated, and then 23 will be loaded into the head of the linked list.
  • At this time, data page 7 is accessed, because data page 7 is in the linked list, that is, the page is in the Buffer Pool, so data page 7 can be directly moved to the head of the linked list.
  • As shown in the figure below, the length of the LRU linked list is 22, and the nodes are data page control blocks from 1 to 22. The initial state is as follows

The above is the implementation process of LUR linked list, but this method will have problems for MySQL, so MySQL does not directly use the simple implementation of LRU linked list, but makes some improvements to it. The specific improvements are made below. Go ahead and explain.

3.4 Flush linked list

As explained earlier, we **"Read and write data first operate on the cache pages in the Buffer Pool, and then write the dirty pages to the disk through the background thread, and persist them to the disk, that is, flushing the dirty pages"** .

"Dirty page: When performing a write operation, the cache page is updated first. At this time, the data of the cache page and the disk page will be inconsistent. This is often called a dirty page. "

Since dirty pages are generated, the disk needs to be updated, which is often called dirtying. How to determine which cache pages need to be flushed? It is also not possible to refresh the disk for a hundred years for all cached pages, or to traverse and compare them one by one. This method is definitely not advisable. In this case, Flush linked list is needed.

3.4.1 What is a Flush linked list?

"The structure of the Flush linked list is very similar to that of the Free linked list, and is also composed of base nodes and child nodes . "

  • The Flush linked list is a two-way linked list, and the linked list node is the control block corresponding to the modified cache page (updated cache page)
  • The role of the Flush linked list: helps locate dirty pages and cache pages that need to be flushed
  • "Base node" : Like the free linked list, it links the first and last nodes and stores how many blocks of description information there are.
  • "child node"
  • "Each node is the control block corresponding to the dirty page. That is, as long as a cache page is modified, its control block will be placed in the Flush linked list."
  • Each control block has two pointers pre (pointing to the previous node) and next (pointing to the next node).

"As mentioned earlier, the control block is actually in the Buffer Pool. The control block forms a linked list through the references of the upper and lower nodes, so you only need to traverse the child nodes one by one through the base node to find the data page that needs to be cleaned. "

3.4.2 Flush linked list writing process

When we are writing data, we know that the efficiency of disk IO is very slow, so MySQL will not update the disk directly, but will go through the following two steps:

  • The first step: update the data page in the Buffer Pool, a memory operation;
  • Step 2: Write the update operations sequentially to the Redo log, a disk sequential write operation;

This efficiency is the highest. Writing Redo log sequentially, tens of thousands of times per second, is not a big problem.

img

The above figure describes the writing process of the Flush linked list when updating the data page. In fact, this is only under the premise that the updated data has been loaded into the Buffer Pool. If the data we want to update has not been loaded in advance, then this Will the process read the disk first? In fact, it does not. In order to improve performance and reduce disk IO, MySQL has made a lot of optimizations. When the data page does not exist in the Buffer Pool, it will use the write buffer (change buffer) to do the update operation. The specific implementation principle is An article will explain further.

"When the control block is added to the Flush linked list, the background thread can traverse the Flush linked list and write dirty pages to disk . "

3.5 Buffer Pool data page

Having learned about the three types of linked lists and how to use them, we can summarize, "In fact, there are three types of data pages and linked lists in the Buffer Pool to manage data . "

img

  • Free Page means that this data page is not used and is empty, and its control block is located in the Free linked list;
  • "Clean Page" means that this data page has been used, data has been cached, and its control block is located in the LRU linked list.
  • "Dirty Page" means that this data page [has been used] and [has been modified], and the data in the data page and the data on the disk are no longer consistent. When the data on the dirty page is written to the disk and the memory data is consistent with the disk data, the page becomes a clean page. "The control block of dirty pages exists in both the LRU linked list and the Flush linked list . "

Linux C/C++ back-end server development learning materials, teaching videos and learning roadmaps (materials include C/C++, Linux, golang technology, Nginx, ZeroMQ, MySQL, Redis, fastdfs, MongoDB, ZK, streaming media, CDN, P2P, K8S, Docker, TCP/IP, coroutine, DPDK, ffmpeg, etc.), if necessary, you can add the learning exchange group 739729163 to receive it

Guess you like

Origin blog.csdn.net/weixin_52622200/article/details/131775504