Handwriting redis from scratch (eight) simple LRU elimination algorithm performance optimization

Preface

Java implements redis by hand from scratch (1) How to achieve a fixed size cache?

Java implements redis by hand from scratch (three) redis expire expiration principle

Java implements redis by hand from scratch (3) How to restart without losing memory data?

Java implements redis from scratch by hand (four) add listener

Another way to implement redis (5) expiration strategy from scratch

Java implements redis by hand from scratch (6) AOF persistence principle detailed and implementation

We simply implemented several features of redis. Java implements redis by hand from scratch (1) How to achieve a fixed-size cache? A first-in, first-out elimination strategy has been implemented in China.

However, in actual work practice, it is generally recommended to use the LRU/LFU removal strategy.

LRU basics

What is it

The full name of the LRU algorithm is the Least Recently Use algorithm, which is widely used in caching mechanisms.

When the space used by the cache reaches the upper limit, a part of the existing data needs to be eliminated to maintain the availability of the cache, and the selection of eliminated data is completed through the LRU algorithm.

The basic idea of ​​the LRU algorithm is time locality based on the principle of locality:

If an information item is being accessed, it is likely to be accessed again in the near future.

Further reading

Apache Commons LRUMAP source code detailed explanation

Redis used as LRU MAP

Java handwritten redis from scratch (7) detailed explanation and implementation of redis LRU exclusion strategy

Simple implementation ideas

Based on array

Solution: Attach an additional attribute to each piece of data—time stamp, and update the time stamp of the data to the current time every time the data is accessed.

When the data space is full, the entire array is scanned and the data with the smallest time stamp is eliminated.

Insufficiency: Maintaining the timestamp requires additional space, and scanning the entire array when eliminating data.

This time complexity is too bad, and the space complexity is not good.

Based on a doubly linked list of limited length

Solution: When accessing a piece of data, when the data is not in the linked list, insert the data to the head of the linked list, and if it is in the linked list, move the data to the head of the linked list. When the data space is full, the data at the end of the linked list is eliminated.

Insufficiency: When inserting or fetching data, the entire linked list needs to be scanned.

This is the way we implemented it in the previous section. The shortcomings are still obvious. Every time we confirm whether an element exists, it takes O(n) time complexity to query.

Based on doubly linked list and hash table

Solution: In order to improve the above defects that need to scan the linked list, cooperate with the hash table to map the data and the nodes in the linked list, and reduce the time complexity of insert operation and read operation from O(N) to O(1)

Disadvantages: This makes the optimization ideas we mentioned in the previous section, but there are still disadvantages, that is, the space complexity is doubled.

Choice of data structure

(1) Array-based implementation

It is not recommended to choose array or ArrayList here, because the time complexity of reading is O(1), but the update is relatively slow, although jdk uses System.arrayCopy.

(2) Implementation based on linked list

If we choose a linked list, the key and the corresponding subscript cannot be simply stored in the HashMap.

Because the traversal of the linked list is actually O(n), the doubly linked list can theoretically be optimized by half, but this is not the O(1) effect we want.

(3) Based on two-way list

We keep the doubly linked list unchanged.

We put the node information of the doubly linked list for the value corresponding to the key in the Map.

The realization method becomes the realization of a doubly linked list.

Code

  • Node definition
/**
 * 双向链表节点
 * @author binbin.hou
 * @since 0.0.12
 * @param <K> key
 * @param <V> value
 */
public class DoubleListNode<K,V> {

    /**
     * 键
     * @since 0.0.12
     */
    private K key;

    /**
     * 值
     * @since 0.0.12
     */
    private V value;

    /**
     * 前一个节点
     * @since 0.0.12
     */
    private DoubleListNode<K,V> pre;

    /**
     * 后一个节点
     * @since 0.0.12
     */
    private DoubleListNode<K,V> next;

    //fluent get & set
}
  • Core code implementation

We keep the original interface unchanged, and the implementation is as follows:

public class CacheEvictLruDoubleListMap<K,V> extends AbstractCacheEvict<K,V> {

    private static final Log log = LogFactory.getLog(CacheEvictLruDoubleListMap.class);

    /**
     * 头结点
     * @since 0.0.12
     */
    private DoubleListNode<K,V> head;

    /**
     * 尾巴结点
     * @since 0.0.12
     */
    private DoubleListNode<K,V> tail;

    /**
     * map 信息
     *
     * key: 元素信息
     * value: 元素在 list 中对应的节点信息
     * @since 0.0.12
     */
    private Map<K, DoubleListNode<K,V>> indexMap;

    public CacheEvictLruDoubleListMap() {
        this.indexMap = new HashMap<>();
        this.head = new DoubleListNode<>();
        this.tail = new DoubleListNode<>();

        this.head.next(this.tail);
        this.tail.pre(this.head);
    }

    @Override
    protected ICacheEntry<K, V> doEvict(ICacheEvictContext<K, V> context) {
        ICacheEntry<K, V> result = null;
        final ICache<K,V> cache = context.cache();
        // 超过限制,移除队尾的元素
        if(cache.size() >= context.size()) {
            // 获取尾巴节点的前一个元素
            DoubleListNode<K,V> tailPre = this.tail.pre();
            if(tailPre == this.head) {
                log.error("当前列表为空,无法进行删除");
                throw new CacheRuntimeException("不可删除头结点!");
            }

            K evictKey = tailPre.key();
            V evictValue = cache.remove(evictKey);
            result = new CacheEntry<>(evictKey, evictValue);
        }

        return result;
    }

    /**
     * 放入元素
     *
     * (1)删除已经存在的
     * (2)新元素放到元素头部
     *
     * @param key 元素
     * @since 0.0.12
     */
    @Override
    public void update(final K key) {
        //1. 执行删除
        this.remove(key);

        //2. 新元素插入到头部
        //head<->next
        //变成:head<->new<->next
        DoubleListNode<K,V> newNode = new DoubleListNode<>();
        newNode.key(key);

        DoubleListNode<K,V> next = this.head.next();
        this.head.next(newNode);
        newNode.pre(this.head);
        next.pre(newNode);
        newNode.next(next);

        //2.2 插入到 map 中
        indexMap.put(key, newNode);
    }

    /**
     * 移除元素
     *
     * 1. 获取 map 中的元素
     * 2. 不存在直接返回,存在执行以下步骤:
     * 2.1 删除双向链表中的元素
     * 2.2 删除 map 中的元素
     *
     * @param key 元素
     * @since 0.0.12
     */
    @Override
    public void remove(final K key) {
        DoubleListNode<K,V> node = indexMap.get(key);

        if(ObjectUtil.isNull(node)) {
            return;
        }

        // 删除 list node
        // A<->B<->C
        // 删除 B,需要变成: A<->C
        DoubleListNode<K,V> pre = node.pre();
        DoubleListNode<K,V> next = node.next();

        pre.next(next);
        next.pre(pre);

        // 删除 map 中对应信息
        this.indexMap.remove(key);
    }

}

It is not difficult to implement, it is a simple two-way list.

It's just that when getting nodes, we use map to reduce the time complexity to O(1).

test

Let's verify our implementation:

ICache<String, String> cache = CacheBs.<String,String>newInstance()
        .size(3)
        .evict(CacheEvicts.<String, String>lruDoubleListMap())
        .build();
cache.put("A", "hello");
cache.put("B", "world");
cache.put("C", "FIFO");

// 访问一次A
cache.get("A");
cache.put("D", "LRU");

Assert.assertEquals(3, cache.size());
System.out.println(cache.keySet());
  • Log
[DEBUG] [2020-10-03 09:37:41.007] [main] [c.g.h.c.c.s.l.r.CacheRemoveListener.listen] - Remove key: B, value: world, type: evict
[D, A, C]

Because we visited A once, B has become the least visited element.

Based on LinkedHashMap

In fact, LinkedHashMap itself is a combined data structure for list and hashMap, we can directly use LinkedHashMap in jdk to achieve.

Direct realization

public class LRUCache extends LinkedHashMap {

    private int capacity;

    public LRUCache(int capacity) {
        // 注意这里将LinkedHashMap的accessOrder设为true
        super(16, 0.75f, true);
        this.capacity = capacity;
    }

    @Override
    protected boolean removeEldestEntry(Map.Entry eldest) {
        return super.size() >= capacity;
    }
}

The default LinkedHashMap does not eliminate data, so we rewrite its removeEldestEntry() method. When the number of data reaches the preset upper limit, the data is eliminated. Setting accessOrder to true means sorting in the order of access.

The amount of code for the entire implementation is not large, mainly using the characteristics of LinkedHashMap.

Simple transformation

We simply modified this method to adapt it to the interface we defined.

ICache<String, String> cache = CacheBs.<String,String>newInstance()
        .size(3)
        .evict(CacheEvicts.<String, String>lruLinkedHashMap())
        .build();
cache.put("A", "hello");
cache.put("B", "world");
cache.put("C", "FIFO");
// 访问一次A
cache.get("A");
cache.put("D", "LRU");

Assert.assertEquals(3, cache.size());
System.out.println(cache.keySet());

test

  • Code
ICache<String, String> cache = CacheBs.<String,String>newInstance()
        .size(3)
        .evict(CacheEvicts.<String, String>lruLinkedHashMap())
        .build();
cache.put("A", "hello");
cache.put("B", "world");
cache.put("C", "FIFO");
// 访问一次A
cache.get("A");
cache.put("D", "LRU");

Assert.assertEquals(3, cache.size());
System.out.println(cache.keySet());
  • Log
[DEBUG] [2020-10-03 10:20:57.842] [main] [c.g.h.c.c.s.l.r.CacheRemoveListener.listen] - Remove key: B, value: world, type: evict
[D, A, C]

summary

The problem of array O(n) traversal mentioned in the previous section has been basically solved in this section.

But in fact, this algorithm still has certain problems. For example, when occasional batch operations, hot data will be squeezed out of the cache by non-hot data. In the next section, we will learn how to further improve the LRU algorithm.

The article mainly talks about the ideas, and the realization part is not posted all due to space limitations.

Open source address:https://github.com/houbb/cache

If you think this article is helpful to you, you are welcome to like, comment, bookmark and follow a wave~

Your encouragement is my biggest motivation~

Deep learning

Guess you like

Origin blog.51cto.com/9250070/2539978