Java handwritten redis from scratch (9) How does the LRU cache elimination algorithm avoid cache pollution

Preface

Java implements redis by hand from scratch (1) How to achieve a fixed size cache?

Java implements redis by hand from scratch (three) redis expire expiration principle

Java implements redis by hand from scratch (3) How to restart without losing memory data?

Java implements redis from scratch by hand (four) add listener

Another way to implement redis (5) expiration strategy from scratch

Java implements redis by hand from scratch (6) AOF persistence principle detailed and implementation

Java implements redis by hand from scratch (7) Detailed explanation of LRU cache elimination strategy

Handwriting redis from scratch (eight) simple LRU elimination algorithm performance optimization

In the first two sections, we implemented the LRU algorithm and optimized its performance.

As the last section of the LRU algorithm, this section mainly solves the problem of cache pollution.

LRU basics

What is it

The full name of the LRU algorithm is the Least Recently Use algorithm, which is widely used in caching mechanisms.

When the space used by the cache reaches the upper limit, a part of the existing data needs to be eliminated to maintain the availability of the cache, and the selection of eliminated data is completed through the LRU algorithm.

The basic idea of ​​the LRU algorithm is time locality based on the principle of locality:

If an information item is being accessed, it is likely to be accessed again in the near future.

Further reading

Apache Commons LRUMAP source code detailed explanation

Redis used as LRU MAP

Java handwritten redis from scratch (7) detailed explanation and implementation of redis LRU exclusion strategy

Shortcomings of the naive LRU algorithm

When there is hot data, LRU is very efficient, but occasional and periodic batch operations will cause the LRU hit rate to drop sharply, and the cache pollution will be more serious.

Extended algorithm

1. LRU-K

The K in LRU-K represents the number of recent uses, so LRU can be considered LRU-1.

The main purpose of LRU-K is to solve the problem of "cache pollution" in the LRU algorithm. Its core idea is to extend the "recently used 1 time" criterion to "recently used K times".

Compared with LRU, LRU-K needs to maintain one more queue to record the history of all cached data being accessed. Only when the number of data accesses reaches K times, the data is put into the cache.

When data needs to be eliminated, LRU-K will eliminate the data whose Kth access time is the largest from the current time.

When the data is accessed for the first time, it is added to the historical access list. If the data does not reach K times in the access history list, it will be eliminated according to certain rules (FIFO, LRU);

When the number of data accesses in the access history queue reaches K times, the data index is deleted from the history queue, the data is moved to the cache queue, and the data is cached, and the cache queue is sorted by time again;

After the cache data queue is accessed again, it is reordered. When data needs to be eliminated, the data at the end of the cache queue is eliminated, that is, "eliminate the data with the longest access to the last K times".

LRU-K has the advantages of LRU, while also avoiding the disadvantages of LRU. In practical applications, LRU-2 is the most comprehensive choice.

Since LRU-K also needs to record the objects that have been accessed but have not been put in the cache, the memory consumption will be more than LRU.

2. two queue

Two queues (the following uses 2Q instead) algorithm is similar to LRU-2, the difference is that 2Q changes the access history queue in the LRU-2 algorithm (note that this is not the cache data) into a FIFO buffer queue, that is: 2Q algorithm has two There are two buffer queues, one is FIFO queue and the other is LRU queue.

When the data is accessed for the first time, the 2Q algorithm buffers the data in the FIFO queue. When the data is accessed for the second time, it moves the data from the FIFO queue to the LRU queue. The two queues eliminate the data according to their own methods.

The newly accessed data is inserted into the FIFO queue. If the data has not been accessed again in the FIFO queue, it will eventually be eliminated according to the FIFO rules;

If the data is accessed again in the FIFO queue, the data is moved to the head of the LRU queue; if the data is accessed again in the LRU queue, the data is moved to the head of the LRU queue, and the LRU queue eliminates the data at the end.

3. Many Queue (MQ)

The MQ algorithm divides the data into multiple queues according to the frequency of access. Different queues have different access priorities. The core idea is to prioritize the data with many access times .

The detailed algorithm structure diagram is as follows, Q0, Q1...Qk represent different priority queues, and Q-history represents a queue that eliminates data from the cache, but records the index of the data and the number of references:

The newly inserted data is put into Q0, and each queue is managed according to the LRU. When the number of data accesses reaches a certain number and the priority needs to be increased, the data is deleted from the current queue and added to the head of the higher-level queue; To prevent high-priority data from never being eliminated, when the data is not accessed within the specified time, the priority needs to be lowered, the data is deleted from the current queue, and added to the head of the lower-level queue; when data needs to be eliminated, Starting from the lowest level queue, it is eliminated according to the LRU. When each queue eliminates data, the data is deleted from the cache and the data index is added to the Q-history header.

If the data is revisited in Q-history, its priority is recalculated and moved to the head of the target queue.

Q-history eliminates the data index according to LRU.

MQ needs to maintain multiple queues, and needs to maintain the access time of each data, which is more complicated than LRU.

LRU algorithm comparison

Point of contrast Compared
Hit rate LRU-2> MQ (2)> 2Q> LRU
the complexity LRU-2> MQ (2)> 2Q> LRU
cost LRU-2> MQ (2)> 2Q> LRU

Personal understanding

In fact, the above algorithms are similar in thinking.

Core purpose: to solve the problem of hot data invalidation and cache contamination caused by batch operations.

Implementation: Add a queue to store data that is accessed only once, and then put it into the LRU according to the number of times.

The queue that is accessed only once can be a FIFO queue or an LRU. Let's implement two implementations of 2Q and LRU-2.

2Q

Realization idea

In fact, it is the combination of our previous FIFO + LRU.

Code

Basic attributes

public class CacheEvictLru2Q<K,V> extends AbstractCacheEvict<K,V> {

    private static final Log log = LogFactory.getLog(CacheEvictLru2Q.class);

    /**
     * 队列大小限制
     *
     * 降低 O(n) 的消耗,避免耗时过长。
     * @since 0.0.13
     */
    private static final int LIMIT_QUEUE_SIZE = 1024;

    /**
     * 第一次访问的队列
     * @since 0.0.13
     */
    private Queue<K> firstQueue;

    /**
     * 头结点
     * @since 0.0.13
     */
    private DoubleListNode<K,V> head;

    /**
     * 尾巴结点
     * @since 0.0.13
     */
    private DoubleListNode<K,V> tail;

    /**
     * map 信息
     *
     * key: 元素信息
     * value: 元素在 list 中对应的节点信息
     * @since 0.0.13
     */
    private Map<K, DoubleListNode<K,V>> lruIndexMap;

    public CacheEvictLru2Q() {
        this.firstQueue = new LinkedList<>();
        this.lruIndexMap = new HashMap<>();
        this.head = new DoubleListNode<>();
        this.tail = new DoubleListNode<>();

        this.head.next(this.tail);
        this.tail.pre(this.head);
    }

}

Data elimination

The logic of data elimination:

Execute when the cache size has reached the maximum limit:

(1) Prioritize the elimination of data in firstQueue

(2) If the data in firstQueue is empty, the data information in lruMap is eliminated.

Here is an assumption: We believe that data that has been accessed multiple times is more important than data that has been accessed only once.

@Override
protected ICacheEntry<K, V> doEvict(ICacheEvictContext<K, V> context) {
    ICacheEntry<K, V> result = null;
    final ICache<K,V> cache = context.cache();
    // 超过限制,移除队尾的元素
    if(cache.size() >= context.size()) {
        K evictKey = null;
        //1. firstQueue 不为空,优先移除队列中元素
        if(!firstQueue.isEmpty()) {
            evictKey = firstQueue.remove();
        } else {
            // 获取尾巴节点的前一个元素
            DoubleListNode<K,V> tailPre = this.tail.pre();
            if(tailPre == this.head) {
                log.error("当前列表为空,无法进行删除");
                throw new CacheRuntimeException("不可删除头结点!");
            }
            evictKey = tailPre.key();
        }
        // 执行移除操作
        V evictValue = cache.remove(evictKey);
        result = new CacheEntry<>(evictKey, evictValue);
    }
    return result;
}

Data deletion

Called when data is deleted:

This logic is similar to before, except that one more FIFO queue is removed.

/**
 * 移除元素
 *
 * 1. 获取 map 中的元素
 * 2. 不存在直接返回,存在执行以下步骤:
 * 2.1 删除双向链表中的元素
 * 2.2 删除 map 中的元素
 *
 * @param key 元素
 * @since 0.0.13
 */
@Override
public void removeKey(final K key) {
    DoubleListNode<K,V> node = lruIndexMap.get(key);
    //1. LRU 删除逻辑
    if(ObjectUtil.isNotNull(node)) {
        // A<->B<->C
        // 删除 B,需要变成: A<->C
        DoubleListNode<K,V> pre = node.pre();
        DoubleListNode<K,V> next = node.next();
        pre.next(next);
        next.pre(pre);
        // 删除 map 中对应信息
        this.lruIndexMap.remove(node.key());
    } else {
        //2. FIFO 删除逻辑(O(n) 时间复杂度)
        firstQueue.remove(key);
    }
}

Data update

When data is accessed, increase the priority of the data.

(1) If it is in lruMap, first remove it, and then put it into the head

(2) If it is not in the lruMap, but in the FIFO queue, it is removed from the FIFO queue and added to the LRU map.

(3) If it is not there, just add it to the FIFO queue.

/**
 * 放入元素
 * 1. 如果 lruIndexMap 已经存在,则处理 lru 队列,先删除,再插入。
 * 2. 如果 firstQueue 中已经存在,则处理 first 队列,先删除 firstQueue,然后插入 Lru。
 * 1 和 2 是不同的场景,但是代码实际上是一样的,删除逻辑中做了二种场景的兼容。
 *
 * 3. 如果不在1、2中,说明是新元素,直接插入到 firstQueue 的开始即可。
 *
 * @param key 元素
 * @since 0.0.13
 */
@Override
public void updateKey(final K key) {
    //1.1 是否在 LRU MAP 中
    //1.2 是否在 firstQueue 中
    DoubleListNode<K,V> node = lruIndexMap.get(key);
    if(ObjectUtil.isNotNull(node)
        || firstQueue.contains(key)) {
        //1.3 删除信息
        this.removeKey(key);
        //1.4 加入到 LRU 中
        this.addToLruMapHead(key);
        return;
    }
    //2. 直接加入到 firstQueue 队尾
    //        if(firstQueue.size() >= LIMIT_QUEUE_SIZE) {
//            // 避免第一次访问的列表一直增长,移除队头的元素
//            firstQueue.remove();
//        }
    firstQueue.add(key);
}

Here I think of an optimization point to limit the continuous growth of firstQueue, because the time complexity of traversal is O(n), so the maximum size is limited to 1024.

If it exceeds, remove the elements in the FIFO first.

However, only removing the FIFO and not removing the cache will result in inconsistent activity between the two;

If it is removed at the same time, but the size of the cache is not yet satisfied, it may exceed the user's expectations. This can be used as an optimization point and temporarily commented out.

test

Code

ICache<String, String> cache = CacheBs.<String,String>newInstance()
        .size(3)
        .evict(CacheEvicts.<String, String>lru2Q())
        .build();

cache.put("A", "hello");
cache.put("B", "world");
cache.put("C", "FIFO");

// 访问一次A
cache.get("A");
cache.put("D", "LRU");

Assert.assertEquals(3, cache.size());
System.out.println(cache.keySet());

effect

[DEBUG] [2020-10-03 13:15:50.670] [main] [c.g.h.c.c.s.l.r.CacheRemoveListener.listen] - Remove key: B, value: world, type: evict
[D, A, C]

LRU-2 implementation

Description

The shortcomings in FIFO are quite obvious, requiring O(n) time complexity for traversal.

And the hit rate is still slightly worse than LRU-2.

Ready to work

Here the LRU map has appeared many times. For convenience, we simply encapsulate the LRU map as a data structure.

We use a doubly linked list + HashMap to implement a simple version.

node

node node is the same as before:

public class DoubleListNode<K,V> {

    /**
     * 键
     * @since 0.0.12
     */
    private K key;

    /**
     * 值
     * @since 0.0.12
     */
    private V value;

    /**
     * 前一个节点
     * @since 0.0.12
     */
    private DoubleListNode<K,V> pre;

    /**
     * 后一个节点
     * @since 0.0.12
     */
    private DoubleListNode<K,V> next;

    //fluent getter & setter
}

interface

We temporarily define the 3 most important methods according to our needs.

/**
 * LRU map 接口
 * @author binbin.hou
 * @since 0.0.13
 */
public interface ILruMap<K,V> {

    /**
     * 移除最老的元素
     * @return 移除的明细
     * @since 0.0.13
     */
    ICacheEntry<K, V> removeEldest();

    /**
     * 更新 key 的信息
     * @param key key
     * @since 0.0.13
     */
    void updateKey(final K key);

    /**
     * 移除对应的 key 信息
     * @param key key
     * @since 0.0.13
     */
    void removeKey(final K key);

    /**
     * 是否为空
     * @return 是否
     * @since 0.0.13
     */
    boolean isEmpty();

    /**
     * 是否包含元素
     * @param key 元素
     * @return 结果
     * @since 0.0.13
     */
    boolean contains(final K key);
}

achieve

We are based on DoubleLinkedList + HashMap implementation.

Just sort out the implementation in the previous section.

import com.github.houbb.cache.api.ICacheEntry;
import com.github.houbb.cache.core.exception.CacheRuntimeException;
import com.github.houbb.cache.core.model.CacheEntry;
import com.github.houbb.cache.core.model.DoubleListNode;
import com.github.houbb.cache.core.support.struct.lru.ILruMap;
import com.github.houbb.heaven.util.lang.ObjectUtil;
import com.github.houbb.log.integration.core.Log;
import com.github.houbb.log.integration.core.LogFactory;

import java.util.HashMap;
import java.util.Map;

/**
 * 基于双向列表的实现
 * @author binbin.hou
 * @since 0.0.13
 */
public class LruMapDoubleList<K,V> implements ILruMap<K,V> {

    private static final Log log = LogFactory.getLog(LruMapDoubleList.class);

    /**
     * 头结点
     * @since 0.0.13
     */
    private DoubleListNode<K,V> head;

    /**
     * 尾巴结点
     * @since 0.0.13
     */
    private DoubleListNode<K,V> tail;

    /**
     * map 信息
     *
     * key: 元素信息
     * value: 元素在 list 中对应的节点信息
     * @since 0.0.13
     */
    private Map<K, DoubleListNode<K,V>> indexMap;

    public LruMapDoubleList() {
        this.indexMap = new HashMap<>();
        this.head = new DoubleListNode<>();
        this.tail = new DoubleListNode<>();

        this.head.next(this.tail);
        this.tail.pre(this.head);
    }

    @Override
    public ICacheEntry<K, V> removeEldest() {
        // 获取尾巴节点的前一个元素
        DoubleListNode<K,V> tailPre = this.tail.pre();
        if(tailPre == this.head) {
            log.error("当前列表为空,无法进行删除");
            throw new CacheRuntimeException("不可删除头结点!");
        }

        K evictKey = tailPre.key();
        V evictValue = tailPre.value();

        return CacheEntry.of(evictKey, evictValue);
    }

    /**
     * 放入元素
     *
     * (1)删除已经存在的
     * (2)新元素放到元素头部
     *
     * @param key 元素
     * @since 0.0.12
     */
    @Override
    public void updateKey(final K key) {
        //1. 执行删除
        this.removeKey(key);

        //2. 新元素插入到头部
        //head<->next
        //变成:head<->new<->next
        DoubleListNode<K,V> newNode = new DoubleListNode<>();
        newNode.key(key);

        DoubleListNode<K,V> next = this.head.next();
        this.head.next(newNode);
        newNode.pre(this.head);
        next.pre(newNode);
        newNode.next(next);

        //2.2 插入到 map 中
        indexMap.put(key, newNode);
    }

    /**
     * 移除元素
     *
     * 1. 获取 map 中的元素
     * 2. 不存在直接返回,存在执行以下步骤:
     * 2.1 删除双向链表中的元素
     * 2.2 删除 map 中的元素
     *
     * @param key 元素
     * @since 0.0.13
     */
    @Override
    public void removeKey(final K key) {
        DoubleListNode<K,V> node = indexMap.get(key);

        if(ObjectUtil.isNull(node)) {
            return;
        }

        // 删除 list node
        // A<->B<->C
        // 删除 B,需要变成: A<->C
        DoubleListNode<K,V> pre = node.pre();
        DoubleListNode<K,V> next = node.next();

        pre.next(next);
        next.pre(pre);

        // 删除 map 中对应信息
        this.indexMap.remove(key);
    }

    @Override
    public boolean isEmpty() {
        return indexMap.isEmpty();
    }

    @Override
    public boolean contains(K key) {
        return indexMap.containsKey(key);
    }
}

Realization idea

The implementation of LRU remains unchanged. We can directly replace FIFO with LRU map.

For ease of understanding, we correspond to FIFO as firstLruMap, which is used to store elements that the user has only accessed once.

Store the elements that have been accessed 2 times or more in the original LRU.

Other logic is consistent with 2Q.

achieve

Basic attributes

Define two LRUs to store the accessed information separately

public class CacheEvictLru2<K,V> extends AbstractCacheEvict<K,V> {

    private static final Log log = LogFactory.getLog(CacheEvictLru2.class);

    /**
     * 第一次访问的 lru
     * @since 0.0.13
     */
    private final ILruMap<K,V> firstLruMap;

    /**
     * 2次及其以上的 lru
     * @since 0.0.13
     */
    private final ILruMap<K,V> moreLruMap;

    public CacheEvictLru2() {
        this.firstLruMap = new LruMapDoubleList<>();
        this.moreLruMap = new LruMapDoubleList<>();
    }

}

Elimination achieved

Similar to the lru 2Q mode, here we give priority to eliminating the data information in firstLruMap.

@Override
protected ICacheEntry<K, V> doEvict(ICacheEvictContext<K, V> context) {
    ICacheEntry<K, V> result = null;
    final ICache<K,V> cache = context.cache();
    // 超过限制,移除队尾的元素
    if(cache.size() >= context.size()) {
        ICacheEntry<K,V>  evictEntry = null;
        //1. firstLruMap 不为空,优先移除队列中元素
        if(!firstLruMap.isEmpty()) {
            evictEntry = firstLruMap.removeEldest();
            log.debug("从 firstLruMap 中淘汰数据:{}", evictEntry);
        } else {
            //2. 否则从 moreLruMap 中淘汰数据
            evictEntry = moreLruMap.removeEldest();
            log.debug("从 moreLruMap 中淘汰数据:{}", evictEntry);
        }
        // 执行缓存移除操作
        final K evictKey = evictEntry.key();
        V evictValue = cache.remove(evictKey);
        result = new CacheEntry<>(evictKey, evictValue);
    }
    return result;
}

delete

/**
 * 移除元素
 *
 * 1. 多次 lru 中存在,删除
 * 2. 初次 lru 中存在,删除
 *
 * @param key 元素
 * @since 0.0.13
 */
@Override
public void removeKey(final K key) {
    //1. 多次LRU 删除逻辑
    if(moreLruMap.contains(key)) {
        moreLruMap.removeKey(key);
        log.debug("key: {} 从 moreLruMap 中移除", key);
    } else {
        firstLruMap.removeKey(key);
        log.debug("key: {} 从 firstLruMap 中移除", key);
    }
}

Update

/**
 * 更新信息
 * 1. 如果 moreLruMap 已经存在,则处理 more 队列,先删除,再插入。
 * 2. 如果 firstLruMap 中已经存在,则处理 first 队列,先删除 firstLruMap,然后插入 Lru。
 * 1 和 2 是不同的场景,但是代码实际上是一样的,删除逻辑中做了二种场景的兼容。
 *
 * 3. 如果不在1、2中,说明是新元素,直接插入到 firstLruMap 的开始即可。
 *
 * @param key 元素
 * @since 0.0.13
 */
@Override
public void updateKey(final K key) {
    //1. 元素已经在多次访问,或者第一次访问的 lru 中
    if(moreLruMap.contains(key)
        || firstLruMap.contains(key)) {
        //1.1 删除信息
        this.removeKey(key);
        //1.2 加入到多次 LRU 中
        moreLruMap.updateKey(key);
        log.debug("key: {} 多次访问,加入到 moreLruMap 中", key);
    } else {
        // 2. 加入到第一次访问 LRU 中
        firstLruMap.updateKey(key);
        log.debug("key: {} 为第一次访问,加入到 firstLruMap 中", key);
    }
}

In fact, the code logic of using LRU-2 has become clearer, mainly because we have extracted lruMap as an independent data structure.

test

Code

ICache<String, String> cache = CacheBs.<String,String>newInstance()
        .size(3)
        .evict(CacheEvicts.<String, String>lru2Q())
        .build();
cache.put("A", "hello");
cache.put("B", "world");
cache.put("C", "FIFO");
// 访问一次A
cache.get("A");
cache.put("D", "LRU");
Assert.assertEquals(3, cache.size());
System.out.println(cache.keySet());

Log

In order to facilitate location analysis, a little log was added when the source code was implemented.

[DEBUG] [2020-10-03 14:39:04.966] [main] [c.g.h.c.c.s.e.CacheEvictLru2.updateKey] - key: A 为第一次访问,加入到 firstLruMap 中
[DEBUG] [2020-10-03 14:39:04.967] [main] [c.g.h.c.c.s.e.CacheEvictLru2.updateKey] - key: B 为第一次访问,加入到 firstLruMap 中
[DEBUG] [2020-10-03 14:39:04.968] [main] [c.g.h.c.c.s.e.CacheEvictLru2.updateKey] - key: C 为第一次访问,加入到 firstLruMap 中
[DEBUG] [2020-10-03 14:39:04.970] [main] [c.g.h.c.c.s.e.CacheEvictLru2.removeKey] - key: A 从 firstLruMap 中移除
[DEBUG] [2020-10-03 14:39:04.970] [main] [c.g.h.c.c.s.e.CacheEvictLru2.updateKey] - key: A 多次访问,加入到 moreLruMap 中
[DEBUG] [2020-10-03 14:39:04.972] [main] [c.g.h.c.c.s.e.CacheEvictLru2.doEvict] - 从 firstLruMap 中淘汰数据:EvictEntry{key=B, value=null}
[DEBUG] [2020-10-03 14:39:04.974] [main] [c.g.h.c.c.s.l.r.CacheRemoveListener.listen] - Remove key: B, value: world, type: evict
[DEBUG] [2020-10-03 14:39:04.974] [main] [c.g.h.c.c.s.e.CacheEvictLru2.updateKey] - key: D 为第一次访问,加入到 firstLruMap 中
[D, A, C]

summary

For the improvement of the LRU algorithm, we mainly made two points:

(1) Performance improvement, from O(N) optimization to O(1)

(2) Improvement of batch operation to avoid cache pollution

In fact, in addition to LRU, we have other elimination strategies.

We need to consider the following issues:

Data A has been accessed 10 times and data B has been accessed 2 times. So who is the hot data of the two?

If you think that A is hot data, here is actually another elimination algorithm, based on the elimination algorithm of LFU, think that the more visits, the more hot data .

Let's learn the implementation of the LFU elimination algorithm together in the next section.

Open source address:https://github.com/houbb/cache

If you think this article is helpful to you, please like, comment, collect and follow a wave. Your encouragement is my biggest motivation~

At present, we have solved performance problems and cache pollution problems caused by batches through two optimizations.

I don’t know what you have gained? Or if you have more ideas, welcome to discuss with me in the message area and look forward to meeting your thoughts.

Deep learning

Guess you like

Origin blog.51cto.com/9250070/2540073