Buffer pool (buffer pool), completely understand this! ! !

Applications layered architecture, in order to speed up data access, data will visit the most in the cache (cache) years, each time to prevent access to the database.

Operating system, there will be buffer pool (buffer pool) mechanisms to avoid access to each disk to speed up data access.

MySQL as a storage system, also has a buffer pool (buffer pool) mechanism to avoid each query data disk IO.

Today, we chat and InnoDB buffer pool.

What InnoDB buffer pool cache? What is the use?

Cache table data and index data, the data is loaded on the disk to the buffer pool to avoid every access disk IO, play a role in accelerating access.

Speed, that why not put all the data into the buffer pool ?

Everything has two sides, set aside the volatile data do not say, fast access to the back is a small storage capacity:

(1) cache access fast but small capacity, 200G database stores data, cache capacity may be 64G;

(2) memory access faster, but capacity is small, buy a notebook disks have 2T, memory may be 16G;

Therefore, only the "hottest" data into the "nearest" place to "maximize" the lower disk access.

How to manage and eliminate the buffer pool, maximize performance of it?

Before introducing the details, first introduced the concept of "pre-reading" of.

What is pre-reading?

Disk read and write, not read on demand, but by page read, read at least one page of data (usually 4K), if the data to be read in the next page, it is possible to dispense with subsequent disk IO ,Improve efficiency.

Why pre-reading effective?

Data access, generally follow the principle of "focus on reading and writing," using some of the data, the probability of big data will be used in the vicinity, which is called "locality principle", it indicates that load in advance to be effective, it can indeed reduce disk IO.

Press page (4K) reading, InnoDB buffer pool design and what's the relationship?

(1) page by reading disk access performance can be improved, so the buffer pool page by page but also general data buffer;

(2) read-ahead mechanism revelation to us, can some "might want to visit," the page buffer pool advance to avoid future disk IO operations;

InnoDB is what algorithm to manage the page buffer it?

Most likely to think, is LRU (Least recently used).

Voiceover: memcache, OS will use the LRU page replacement to manage, but MySQL does not play the same.

How is the traditional LRU page buffer management?

The most common play is put into the buffer pool pages into the LRU head, as a recent visit to the elements, so the latest to be eliminated. Here again two cases:

(1) page is already in the buffer pool , it is only the "Move" action LRU head, but no page to be eliminated;

(2) page is not in the buffer pool , in addition to doing the "Add" LRU head operation, the action to be done "out" tail LRU page;

FIG above, if the length of the LRU buffer pool manager 10, the buffer page number 1,3,5 ..., p of 40,7.

If the data to be accessed in the next page number to page 4:

(1) the page number of the page 4, already in the buffer pool;

(2) the page number of the page 4, can be placed in the LRU header, the page is not eliminated;

VO: In order to reduce data movement, LRU linked list is generally realized.

If, then the next data to be accessed in the page number to page 50:

(1) the page number of the page 50, is not in the original buffer pool;

(2) the page number of the page 50, into the LRU head, tail out of the page while the page number 7;

Traditional LRU algorithm buffer pool is very intuitive, OS, memcache, and many other software in use, MySQL why so hypocritical, can not directly use it?

There are two issues:

(1) pre-read failure;

(2) contamination of the buffer pool;

What is pre-reading failure?

Since the pre-read (Read-Ahead), ahead of the page into the buffer pool, but ultimately MySQL not read data from the page, is called pre-reading failure.

How to optimize the pre-reading failure?

To optimize the pre-reading failure, the idea is:

(1) failed to make the page read-ahead, stay in the buffer pool LRU in the shortest time possible;

(2) make page really is read, it moved to the buffer pool LRU head;

To ensure that the real data is read in the heat left in the buffer pool as long as possible.

The specific method is:

(1) The LRU divided into two parts:

  • Cenozoic (new sublist)
  • Older Generation (old sublist)

(2) wants to know the new collection is connected to the end, namely: a new generation of the tail (tail) connected Older Generation head (head);

(3) a new page (e.g., the pre-read page) is added to the buffer pool, only added to the head of the old generation:

  • If the data is actually being read (pre-read successful), will be added to the new generation of head
  • If the data is not read, than the new generation will be in the "hot data page" earlier eliminated from the buffer pool

For example, the entire LRU buffer pool as shown above:

(1) The entire length is LRU 10;

(2) the first 70% of the new generation;

(3) 30% of the old generation;

(4) the new old generation end to end;

If there is a new page number of the page 50 is added to the pre-read buffer pool:

(1) 50 will be inserted from the old generation of the head, the tail of the old generation (also the whole tail) page will be eliminated;

(2) assuming that the page 50 is not actually read, i.e., pre-read fails, it will out earlier than the new generation of the data buffer pool;

If 50 this page to be read immediately, such as SQL access to the data in the row row Page:

(1) It will be added immediately to a new generation of head;

(2) page will be pushed to a new generation of the old generation, this time there will be a page is not really eliminated;

Improved version of the LRU buffer pool can be a good solution to the problem "pre-reading failure".

Voiceover: But do not unworthy, because of fear of failure and canceled pre-reading pre-reading strategies, in most cases, the principle of locality is established, pre-reading is valid. *

The new improved version of the old generation LRU buffer pool is still not solve the problem of pollution.

What is MySQL buffer pool contamination?

When one SQL statement to batch scan large amounts of data, it may lead to all the pages are swapped out of the buffer pool, resulting in a large number of hot data to be swapped out, a sharp decline in the performance of MySQL, this is called the buffer pool contamination.

For example, there is a large amount of user data table, when executed:

select * from user where name like "%shenjian%";

While the results may have little data set, but not like this kind of hit the index, you must scan the entire table, you need access to a large number of pages:

(1) was added to the page buffer pool (insertion head portion of the old generation);

(2) read from the relevant page in the row (a head insert new generation);

(3) row in the name field and comparing strings shenjian, if eligible, were added to the result set;

(4) ... until scanning all the row all pages ...

In this way, all the data pages will be loaded into the new generation of the head, but will only visit once, really hot a lot of data is swapped out.

How this type of scan code results in large amounts of data buffer pool pollution problem?

MySQL buffer pool to join the mechanism "of the old generation dwell time window":

(1) assuming T = residence time of the old generation window;

(2) is inserted into the head of the Older Generation page, even if accessed immediately, and will not immediately head into the new generation;

(3) only meet "is accessed" and "dwell time at the Older Generation" is greater than T, will be placed in the new generation of the head;

Continuing the example, if the batch scanning data, there are 51,52,53,54,55 five pages will in turn be accessed.

If no policy "Older Generation dwell time window", the bulk of these pages are accessed, the data will be swapped out a lot of heat.

After the addition of "Older Generation residence time windows" strategy, short period of time a large number of pages to load, and will not immediately insert the new generation of the head, but the priority out of those, only a short-term visit to a page.

Older Generation and only in the time spent long enough residence time greater than T, the new generation will be inserted into the head.

The above principle, in which parameters corresponding to InnoDB?

There are three important parameters.

Parameters : innodb_buffer_pool_size

Description : Configure the size of the buffer pool, in the case of memory allows, DBA will often recommend that turn up this parameter, the more data and indexes into memory, database performance will be better.

Parameters : innodb_old_blocks_pct

Introduction : old generation ratio of the total length of the LRU chain, default is 37, i.e. the LRU entire length of the old generation to the new generation ratio is 63:37.

Narrator: If this parameter is set to 100, then reduced to a common LRU.

Parameters : innodb_old_blocks_time

Introduction : The residence time window of the old generation, in milliseconds, the default is 1000, i.e., satisfies "the visited" and "remain in the old generation than 1 second" two conditions, will be inserted into the head of the new generation.

to sum up

(1) buffer pool (buffer pool) is a mechanism common to reduce disk access;

(2) buffer pool is generally in pages (page) in units of cache data;

(3) a buffer pool common management algorithm is the LRU , memcache, the OS, the InnoDB use of this algorithm;

(4) InnoDB ordinary LRU optimized:

  • The pool is divided into the old generation and the new generation , into the buffer pool page, preferential access to the old generation, the page is accessed, before entering the new generation, in order to solve the problem of pre-reading failure
  • Page is accessed, and the Older Generation residence time exceeds the threshold before they enter the new generation, in order to solve the bulk data access, a lot of heat out of the issue of data

Ideas , important than the conclusions.

To solve the problem, more important than the program.

Guess you like

Origin juejin.im/post/5d11a79ee51d4555e372a624