How to solve the problem of high occupancy of mysql paging buffer pool_buffer pool (buffer pool), thoroughly understand this time! ! !

 

https://blog.csdn.net/weixin_40009393/article/details/111103350

 

 

The layered architecture of the application system, in order to speed up data access, will store the most frequently accessed data in the cache to avoid accessing the database every time.

The operating system will have a buffer pool mechanism to avoid access to the disk every time to speed up data access.

As a storage system, MySQL also has a buffer pool mechanism to avoid disk IO every time data is queried.

Today, I will talk to you about InnoDB's buffer pool.

What does InnoDB's buffer pool cache? What is the use?

Cache table data and index data, load the data on the disk into the buffer pool, avoid disk IO for each access, and play a role in accelerating access.

The speed is fast, so why not put all the data in the buffer pool ?

Everything has two sides. Regardless of the volatility of data, the opposite of fast access is the small storage capacity:

(1) The cache access is fast, but the capacity is small. The database stores 200G data, and the cache capacity may only be 64G;

(2) The memory access is fast, but the capacity is small. If you buy a notebook disk with 2T, the memory may only be 16G;

Therefore, only the "hottest" data can be placed in the "recent" place to reduce disk access to the "maximum" level.

How to manage and eliminate the buffer pool to maximize performance?

Before introducing the specific details, let's introduce the concept of "pre-reading".

What is pre-reading?

Disk read and write is not read on demand, but read by page. At least one page of data (usually 4K) is read at a time. If the data to be read in the future is in the page, subsequent disk IO can be omitted ,Improve efficiency.

Why is pre-reading effective?

Data access usually follows the principle of "centralized read and write". If some data is used, nearby data will be used with a high probability. This is the so-called "locality principle", which shows that early loading is effective and can indeed reduce disk IO.

What is the relationship between page-by-page (4K) reading and InnoDB's buffer pool design?

(1) Disk access reading by page can improve performance, so the buffer pool generally caches data by page;

(2) The read-ahead mechanism has inspired us to add some "may be accessed" pages to the buffer pool in advance to avoid future disk IO operations;

What algorithm does InnoDB use to manage these buffer pages?

The easiest thing to think of is LRU (Least recently used).

Voiceover: Memcache and OS will use LRU for page replacement management, but MySQL's gameplay is different.

How does traditional LRU manage buffer pages?

The most common way to play is to put the pages in the buffer pool at the head of the LRU as the most recently accessed element, and thus be eliminated at the latest. There are two situations:

(1) The page is already in the buffer pool , then only the action of "moving to" the LRU head is performed, and no page is eliminated;

(2) If the page is not in the buffer pool , in addition to the action of "putting in" the head of the LRU, the action of "eliminating" the tail page of the LRU is also required;

7175f7c07a80ab3ca92a5faa59241fda.png

As shown in the figure above, if the LRU length of the management buffer pool is 10, pages with page numbers 1, 3, 5..., 40, 7 are buffered.

Suppose, the data to be accessed next is in the page with page number 4:

30b6d09d0d892e72920eba91a2ae11b1.png

(1) The page with page number 4 is originally in the buffer pool;

(2) Put the page with page number 4 at the head of the LRU, and no page will be eliminated;

Voiceover: In order to reduce data movement, LRU is generally implemented with a linked list.

Suppose, the data to be accessed next is in the page with page number 50:

5634d6dea78381b9d2dfd42c0e2c566b.png

(1) The page with page number 50 was originally not in the buffer pool;

(2) Put the page with page number 50 at the head of the LRU, and eliminate the page with page number 7 at the end at the same time;

The traditional LRU buffer pool algorithm is very intuitive . Many software such as OS and memcache are all used. Why is MySQL so hypocritical that it can't be used directly?

There are two problems here:

(1) Pre-reading fails;

(2) Buffer pool pollution;

What is read-ahead failure?

Due to Read-Ahead, the page is put into the buffer pool in advance, but in the end MySQL does not read data from the page, which is called read-ahead failure.

How to optimize the read-ahead failure?

To optimize read-ahead failure, the idea is:

(1) Let pages that fail to read ahead stay in the buffer pool LRU as short as possible;

(2) Let the pages that are actually read are moved to the head of the buffer pool LRU;

To ensure that the hot data that is actually read stays in the buffer pool as long as possible.

The specific method is:

(1) Divide LRU into two parts:

  • The new generation (new sublist)
  • Old sublist

(2) The new and old generations are connected at the end, that is, the tail of the new generation is connected to the head of the old generation;

(3) When new pages (such as pre-read pages) are added to the buffer pool, they are only added to the head of the old generation:

  • If the data is actually read (pre-reading is successful), it will be added to the head of the new generation
  • If the data is not read, it will be eliminated from the buffer pool earlier than the "hot data pages" in the new generation

7f1900e47c45ac90cc817091c018e67e.png

For example, the entire buffer pool LRU is as shown above:

(1) The length of the entire LRU is 10;

(2) The first 70% are the new generation;

(3) The last 30% are the old generation;

(4) The old and new generations are connected end to end;

4a037193dce8d8b1fcef479a1b270b40.png

If a new page with page number 50 is pre-read and added to the buffer pool:

(1) 50 will only be inserted from the head of the old generation, and the pages at the end of the old generation (also the overall tail) will be eliminated;

(2) Assuming that page 50 will not be actually read, that is, pre-reading fails, it will be eliminated from the buffer pool earlier than the new generation of data;

a8c9294aa25461cfebf54d55d4037f7d.png

If page 50 is read immediately, for example, SQL accesses the row data in the page:

(1) It will be immediately added to the head of the new generation;

(2) The pages of the new generation will be squeezed into the old generation, and no pages will be really eliminated at this time;

The improved buffer pool LRU can solve the problem of "read-ahead failure".

Voiceover: But don't stop eating because of choking, and cancel the pre-reading strategy because you are afraid of pre-reading failure. In most cases, the principle of locality is valid and the pre-reading is effective.

The new and old generations of improved LRU still cannot solve the problem of buffer pool pollution.

What is MySQL buffer pool pollution?

When a certain SQL statement needs to scan a large amount of data in batches, it may cause all pages in the buffer pool to be replaced, causing a large amount of hot data to be swapped out, and MySQL performance drops sharply. This situation is called buffer pool pollution.

For example, if there is a user table with a large amount of data, when executing:

select * from user where name like "%shenjian%";

Although the result set may have only a small amount of data, this type of like cannot hit the index. A full table scan is required, and a large number of pages need to be accessed:

(1) Add the page to the buffer pool (insert the head of the old generation);

(2) Read the related row from the page (insert the head of the new generation);

(3) The name field in row is compared with the string shenjian, and if it meets the conditions, it is added to the result set;

(4)...until all rows in all pages are scanned...

In this way, all the data pages will be loaded into the head of the new generation, but they will only be accessed once, and the real hot data will be swapped out in large quantities.

How about this kind of buffer pool pollution problem caused by scanning a large amount of data?

The MySQL buffer pool has added a mechanism of "the old generation residence time window":

(1) Assume that T = the residence time window of the old generation;

(2) The page inserted into the head of the old generation will not be placed in the head of the new generation even if it is accessed immediately;

(3) Only when it meets the requirement of "visited" and "stay time in the old generation" is greater than T, it will be put into the head of the young generation;

aa9aba047a08d9a66d719cdf618c4b1c.png

To continue the example, if the batch data is scanned, five pages such as 51, 52, 53, 54, 55 will be accessed in sequence.

2d2324a8da8353322c51e98ef91b943c.png

If there is no "old generation residence time window" strategy, these pages that are accessed in batches will swap out a large amount of hot data.

c973a166702494f913e6557016515472.png

After adding the "old generation residence time window" strategy, pages that are loaded in a large amount in a short time will not be inserted into the head of the new generation immediately, but those pages that have been accessed only once in a short time will be eliminated first.

b322ac1c044e93d543a3183667a08952.png

And only if the old generation stays for long enough and the stay time is greater than T, will it be inserted into the head of the young generation.

The above principles correspond to which parameters in InnoDB?

There are three more important parameters.

9571e5ee4d0cf81b637de89e798b914a.png

参数:innodb_buffer_pool_size

Introduction : Configure the size of the buffer pool. When memory is allowed, DBA will often suggest to increase this parameter. The more data and indexes are put in memory, the better the performance of the database.

Parameters : innodb_old_blocks_pct

Introduction : The ratio of the old generation to the length of the entire LRU chain is 37 by default, that is, the ratio of the length of the young generation to the old generation in the entire LRU is 63:37.

Voiceover: If this parameter is set to 100, it will degenerate into a normal LRU.

Parameters : innodb_old_blocks_time

Introduction : The stay time window of the old generation, the unit is milliseconds, and the default is 1000, that is, it will be inserted into the head of the new generation only if the two conditions of "visited" and "stay in the old generation exceed 1 second" are met.

to sum up

(1) The buffer pool is a common mechanism to reduce disk access;

(2) The buffer pool usually caches data in units of pages;

(3) The common management algorithm of the buffer pool is LRU , memcache, OS, InnoDB all use this algorithm;

(4) InnoDB optimizes ordinary LRU:

  • The buffer pool is divided into the old generation and the young generation . The pages that enter the buffer pool enter the old generation first, and the page is accessed before entering the new generation to solve the problem of pre-read failure.
  • Pages are accessed and the time spent in the old generation exceeds the configured threshold before entering the new generation to solve the problem of batch data access and elimination of large amounts of hot data

Ideas are more important than conclusions.

What problem is solved is more important than the solution.

Guess you like

Origin blog.csdn.net/liuming690452074/article/details/113811983