Two problems that cannot be avoided by the traditional LRU algorithm

 First understand how traditional LRU manages memory data?

 The traditional LRU (least recently used) algorithm is a cache elimination algorithm, which is mainly used to manage memory data. When the memory space is insufficient, the LRU algorithm will eliminate the least recently used data.

The following are the management steps of the traditional LRU algorithm:

1. Maintain a doubly-linked list: each node in the linked list represents a cached data block, and is sorted according to the time order of the data block’s latest access, that is, the closer to the head of the linked list, the most recently accessed, and the closer to the end of the linked list. Longest time unvisited.

2. When a new piece of data enters the cache, add the data to the head of the linked list.

3. When a piece of data needs to be eliminated, the data block at the end of the linked list (the one that has not been accessed for the longest time) is eliminated, and the node is deleted from the linked list.

4. When a piece of data is accessed, move the data block to the head of the linked list, indicating that the data block is the most recently accessed.

5. When the accessed data block is not in the linked list, it means that the data has been eliminated, and the data needs to be obtained from the disk or other storage devices.

The disadvantage of the traditional LRU algorithm is that in practical applications, the access time distribution of each data block may be uneven, resulting in some data blocks being frequently accessed although the number of access times is small. In this case, the LRU algorithm may misjudge and eliminate these data blocks, thereby affecting the query speed. In response to this situation, some improved LRU algorithms (such as 2Q algorithm, ARC algorithm) have been optimized.

 Two problems that cannot be avoided by the traditional LRU algorithm:

  • 1. The failure of pre-reading leads to a decrease in the cache hit rate;
  • 2. Cache pollution leads to a decrease in the cache hit rate;

 What is read-ahead invalidation?

Because of the locality principle of the computer, every time you read from the memory into the buffer, you will read more, because the contents adjacent to the memory are likely to be read again. But if these contents are not used, the effect of read-ahead will disappear, and the position of hot data will be occupied .

To put it simply, someone is occupying the position and not doing work, it's too disgusting! In this way, the efficiency is discounted.

What is cache pollution?

Since the reading speed of the cache is much faster than that of non-caching, in high-performance scenarios, when the system reads data, it first looks for the required data from the cache. If it finds it, it directly reads the result. If it is found, it will be searched from the memory or hard disk, and then the found result will be stored in the cache for the next use.
In fact, for a system, the cache space is limited and precious. It is impossible for us to put all the data in the cache for operation. Even if we can, the data security cannot be guaranteed. Moreover, if the cached data If the amount is too large, the speed will become slower and slower.
At this time, it is necessary to consider the elimination mechanism of the cache, but which data to eliminate and which data to keep is a problem. If not handled properly, it will cause "cache pollution" problem.
Cache pollution refers to the phenomenon that the system moves infrequently used data from the memory to the cache, causing frequent data to be squeezed out and reducing the cache efficiency.

For example: in a company, there will always be 100 employees (the boss can’t afford to hire more people). At this time, if someone joins the company, someone must leave the company. An experienced Java boss (regardless of salary) is definitely a loss for the company!


 The solution is as follows:

In order to avoid the impact of "read-ahead failure", Linux and MySQL have made improvements to the traditional LRU linked list:

  • The Linux operating system implements two LRU lists: an active LRU list (active list) and an inactive LRU list (inactive list) .
  • The MySQL Innodb storage engine is divided into two areas on an LRU linked list: young area and old area .

However, if you still use the method of "as long as the data is accessed once, add the data to the head of the active LRU list (or young area)", then there is still the problem of cache pollution .

 In order to avoid the impact of "cache pollution", the Linux operating system and the MySQL Innodb storage engine have respectively increased the threshold for upgrading to hot data:

  • Linux operating system: When the memory page is accessed for the second time , the page is upgraded from the inactive list to the active list.
  • MySQL Innodb: When the memory page is accessed for the second time , the page will not be upgraded from the old area to the young area immediately, because the time to stay in the old area must be judged :
    • If the second access time is within 1 second of the first access time (default value), then the page will not be upgraded from the old area to the young area;
    • If the second access time is more than 1 second from the first access time , the page will be upgraded from the old area to the young area;

By raising the threshold for entering the active list (or young area), it is very good to avoid the impact of cache pollution.

 

Guess you like

Origin blog.csdn.net/m0_62600503/article/details/131291858