1. Background

This article was written after listening to iQIYI's Java cache road last week at a technical salon. Let's briefly introduce the development of iQIYI's java cache road.

It can be seen that the figure is divided into several stages:

The first stage: data synchronization plus redis

The data is synchronized to redis through the message queue, and then the Java application directly fetches the cache. The advantages of this stage are: because the distributed cache is used, the data update is fast. The disadvantages are also more obvious: relying on the stability of Redis, once redis hangs, the entire cache system is unavailable, causing cache avalanches and all requests hitting the DB.

Second and third stage: JavaMap to Guava cache

This stage uses the in-process cache as the first level cache and redis as the second level. Advantages: Not affected by external systems, other systems can still be used if they hang up. Disadvantage: In-process cache cannot be updated in real time like distributed cache. Due to the limited memory of java, the cache must be set to a size, and then some caches will be eliminated, and there will be a hit rate problem.

Stage 4: Guava Cache Refresh

In order to solve the above problems, Guava Cache can be used to set the refresh time after writing and refresh. Solved the problem of not updating all the time, but still did not solve the real-time refresh.

Stage 5: Asynchronous refresh of external cache

This stage extends Guava Cache and uses redis as a message queue notification mechanism to notify other java applications to refresh.

Here is a brief introduction to the five stages of iQIYI's cache development. Of course, there are some other optimizations, such as GC tuning, cache penetration, and some optimizations for cache coverage. Interested students can follow the official account and contact me for communication.

Primitive Society - Chaku

The above is an evolutionary line of iQIYI, but in the general development process of everyone, the first step is generally not redis, but directly checking the library.

When the traffic is not large, it is the most convenient to check the database or read the file, and it can fully meet our business requirements.

Ancient Society - HashMap

When our application has a certain amount of traffic or the database is queried very frequently, we can use the HashMap or ConcurrentHashMap that comes with our java. We can write this in code:

public class CustomerService {
    private HashMap<String,String> hashMap = new HashMap<>();
    private CustomerMapper customerMapper;
    public String getCustomer(String name){
        String customer = hashMap.get(name);
        if ( customer == null){
            customer = customerMapper.get(name);
            hashMap.put(name,customer);
        }
        return customer;
    }
}

However, there is a problem in doing this. HashMap cannot perform data elimination, and the memory will grow unlimitedly, so hashMap will soon be eliminated. Of course, it does not mean that he is completely useless, just like not everything in our ancient society is outdated, for example, the traditional virtues of our Chinese famous family are timeless, just like this hashMap can be used in certain scenarios As a cache, when the elimination mechanism is not required, for example, we use reflection, if we search for Method and field through reflection every time, the performance will be inefficient. At this time, we use HashMap to cache it, and the performance can be improved a lot.

Modern Society - LRUHashMap

The problems that stumped us in ancient society cannot be eliminated by data, which will lead to infinite expansion of our memory, which is obviously unacceptable to us. Some people say that I will eliminate some data, which is not right, but how to eliminate it? Random elimination? Of course not, just imagine that you just loaded A into the cache, and it will be eliminated the next time you want to access it, and it will access our database again. So why do we want to cache it?

Therefore, smart people have invented several elimination algorithms. The following are three common FIFO, LRU, LFU (there are also some ARC, MRU interested can search by themselves):

FIFO: First-in, first-out, in this elimination algorithm, the first to enter the cache will be eliminated first. This is the simplest, but it will lead to a very low hit rate. Imagine if we have a data with high access frequency that is accessed first by all data, and those that are not very high are accessed later, then our first data will be accessed but his access frequency is very high. to squeeze out.
LRU: Least Recently Used Algorithm. In this algorithm, the above problems are avoided. Every time the data is accessed, it will be placed at the end of our team. If we need to eliminate data, we only need to eliminate the head of the team. But there is still a problem with this. If there is a data that is accessed 10,000 times in the first 59 minutes of an hour (it can be seen that this is a hotspot data), and the data is not accessed in the next minute, but there are other data accesses, it will lead to As a result, our hot data is eliminated.
LFU: Least frequently used recently. In this algorithm, the above is optimized, using additional space to record the frequency of use of each data, and then selecting the lowest frequency for elimination. This avoids the problem that the LRU cannot handle time periods.

The three elimination strategies are listed above. For these three, the implementation cost is higher than the other, and the same hit rate is also better than the other. Generally speaking, the solution we choose can be in the middle, that is, the implementation cost is not too high, and the hit rate is also good. How to implement an LRUMap? We can complete a simple LRUMap by inheriting LinkedHashMap and overriding the removeEldestEntry method.

class LRUMap extends LinkedHashMap {

        private final int max;
        private Object lock;

        public LRUMap(int max, Object lock) {
            //无需扩容
            super((int) (max * 1.4f), 0.75f, true);
            this.max = max;
            this.lock = lock;
        }

        /**
         * 重写LinkedHashMap的removeEldestEntry方法即可
         * 在Put的时候判断，如果为true，就会删除最老的
         * @param eldest
         * @return
         */
        @Override
        protected boolean removeEldestEntry(Map.Entry eldest) {
            return size() > max;
        }

        public Object getValue(Object key) {
            synchronized (lock) {
                return get(key);
            }
        }
        public void putValue(Object key, Object value) {
            synchronized (lock) {
                put(key, value);
            }
        }

       

        public boolean removeValue(Object key) {
            synchronized (lock) {
                return remove(key) != null;
            }
        }
        public boolean removeAll(){
            clear();
            return true;
        }
    }

A linked list of entries (objects used to put key and value) is maintained in LinkedHashMap. In each get or put, the inserted new entry, or the queried old entry will be placed at the end of our linked list. It can be noticed that in the construction method, the set size is deliberately set to max*1.4. In the following removeEldestEntry method, only size>max is required to be eliminated, so that our map will never be able to reach the logic of expansion, by rewriting LinkedHashMap, a few simple methods we implemented our LruMap.

Modern Society - Guava cache

LRUMap has been invented in modern society to eliminate cached data, but there are several problems:

The lock competition is serious. You can see that in my code, Lock is a global lock. At the method level, when the number of calls is large, the performance will inevitably be lower.
Expiration time is not supported
Auto refresh is not supported

So Google's big guys couldn't hold back these problems and invented Guava cache. In Guava cache, you can use it easily like the following code:

public static void main(String[] args) throws ExecutionException {
        LoadingCache<String, String> cache = CacheBuilder.newBuilder()
                .maximumSize(100)
                //写之后30ms过期
                .expireAfterWrite(30L, TimeUnit.MILLISECONDS)
                //访问之后30ms过期
                .expireAfterAccess(30L, TimeUnit.MILLISECONDS)
                //20ms之后刷新
                .refreshAfterWrite(20L, TimeUnit.MILLISECONDS)
                //开启weakKey key 当启动垃圾回收时，该缓存也被回收
                .weakKeys()
                .build(createCacheLoader());
        System.out.println(cache.get("hello"));
        cache.put("hello1", "我是hello1");
        System.out.println(cache.get("hello1"));
        cache.put("hello1", "我是hello2");
        System.out.println(cache.get("hello1"));
    }
    public static com.google.common.cache.CacheLoader<String, String> createCacheLoader() {
        return new com.google.common.cache.CacheLoader<String, String>() {
            @Override
            public String load(String key) throws Exception {
                return key;
            }
        };
    }

I will explain how guava cache solves several problems of LRUMap from the principle of guava cache.

lock contention

Guava cache adopts an idea similar to ConcurrentHashMap, segment locks, and each segment is responsible for its own elimination. In Guava, segmentation is performed according to a certain algorithm. It should be noted here that if there are too few segments, the competition will still be severe. If there are too many segments, random elimination will occur easily. That is to let each data have an exclusive segment, and each segment will handle the elimination process by itself, so random elimination will occur. In the guava cache, the following code is used to calculate how to segment.

    int segmentShift = 0;
    int segmentCount = 1;
    while (segmentCount < concurrencyLevel && (!evictsBySize() || segmentCount * 20 <= maxWeight)) {
      ++segmentShift;
      segmentCount <<= 1;
    }

The segmentCount above is our final segment count, which ensures that each segment has at least 10 entries. If the parameter concurrencyLevel is not set, the default value will be 4, and the final number of segments will be at most 4. For example, if our size is 100, it will be divided into 4 segments, and the maximum size of each segment is 25. In the guava cache, the write operation is directly locked. For the read operation, if the read data has not expired and has been loaded, there is no need to lock it. If it is not read, it will be locked again for a second read. Cache loading is required, that is, through the CacheLoader we configured, what I configured here is to return the key directly, and in the business, it is usually configured to query from the database. As shown below:

Expiration

Compared with LRUMap, there are two more expiration times, one is how long it expires after writing, expireAfterWrite, and the other is how long it expires after reading, expireAfterAccess. The interesting thing is that the expired Entry in the guava cache does not expire immediately (that is, there is no background thread scanning all the time), but the expiration processing is performed during read and write operations. The advantage of this is to avoid the background. A global lock is performed when a thread scans. See the code below:

public static void main(String[] args) throws ExecutionException, InterruptedException {
        Cache<String, String> cache = CacheBuilder.newBuilder()
                .maximumSize(100)
                //写之后5s过期
                .expireAfterWrite(5, TimeUnit.MILLISECONDS)
                .concurrencyLevel(1)
                .build();
        cache.put("hello1", "我是hello1");
        cache.put("hello2", "我是hello2");
        cache.put("hello3", "我是hello3");
        cache.put("hello4", "我是hello4");
        //至少睡眠5ms
        Thread.sleep(5);
        System.out.println(cache.size());
        cache.put("hello5", "我是hello5");
        System.out.println(cache.size());
    }
输出:
4 
1

From this result, we know that the expiration processing is only performed at the time of put. Special attention is that I set the maximum segment to 1 for concurrencyLevel(1) above, otherwise this experimental effect will not occur. As mentioned in the above section, we use segment units for expiration processing. Two queues are maintained in each segment:


    final Queue<ReferenceEntry<K, V>> writeQueue;

  
    final Queue<ReferenceEntry<K, V>> accessQueue;

writeQueue maintains the write queue, the head of the queue represents data written early, and the tail of the queue represents data written late. The accessQueue maintains the access queue, which, like the LRU, is used to eliminate the access time. If the segment exceeds the maximum capacity, such as the 25 we mentioned above, after it exceeds, the first element of the accessQueue queue will be processed. disuse.

void expireEntries(long now) {
      drainRecencyQueue();

      ReferenceEntry<K, V> e;
      while ((e = writeQueue.peek()) != null && map.isExpired(e, now)) {
        if (!removeEntry(e, e.getHash(), RemovalCause.EXPIRED)) {
          throw new AssertionError();
        }
      }
      while ((e = accessQueue.peek()) != null && map.isExpired(e, now)) {
        if (!removeEntry(e, e.getHash(), RemovalCause.EXPIRED)) {
          throw new AssertionError();
        }
      }
    }

The above is the process of guava cache processing expired Entries. It will perform a peek operation on the two queues at one time, and delete them if they expire. Generally, the expired Entries can be processed before and after our put operation, or when the data is read and found to be expired, and then the expiration of the entire segment is processed, or it is called when the lockedGetOrLoad operation is performed for the second read.

void evictEntries(ReferenceEntry<K, V> newest) {
      ///... 省略无用代码

      while (totalWeight > maxSegmentWeight) {
        ReferenceEntry<K, V> e = getNextEvictable();
        if (!removeEntry(e, e.getHash(), RemovalCause.SIZE)) {
          throw new AssertionError();
        }
      }
    }
/**
**返回accessQueue的entry
**/
ReferenceEntry<K, V> getNextEvictable() {
      for (ReferenceEntry<K, V> e : accessQueue) {
        int weight = e.getValueReference().getWeight();
        if (weight > 0) {
          return e;
        }
      }
      throw new AssertionError();
    }

The above is the code when we expelled the Entry. You can see that the accessQueue is accessed to evict the head of the queue. The eviction strategy is generally called when the elements in the segment change, such as insert operations, update operations, and load data operations.

Auto Refresh

The automatic refresh operation is relatively simple to implement in the guava cache. It is directly through the query to determine whether it meets the refresh conditions and refresh.

Other features

There are some other features in Guava cache:

phantom reference

In Guava cache, both key and value can be set for virtual reference, and there are two reference queues in Segment:

    final @Nullable ReferenceQueue<K> keyReferenceQueue;

  
    final @Nullable ReferenceQueue<V> valueReferenceQueue;

These two queues are used to record recycled references, and each queue records the hash of each recycled Entry, so that the previous Entry can be deleted through the hash value in this queue after recycling.

remove listener

In the guava cache, when some data is eliminated, but you don't know whether it is expired, or evicted, or because the virtual reference object is recycled? At this time, you can call this method removeListener(RemovalListener listener) to add a listener to monitor data elimination, log or some other processing, which can be used for data elimination analysis.

All eviction reasons are recorded in RemovalCause: deleted by user, replaced by user, expired, eviction collection, eviction due to size.

Summary of guava cache

After reading the source code of guava cache carefully, it is actually a LRU Map with good performance and rich api. The development of iQIYI's cache is also based on this. Through the secondary development of guava cache, it can update the cache between java application services.

Towards the future - caffeine

The function of guava cache is indeed very powerful and meets the needs of most people, but it is essentially a layer of encapsulation of LRU, so it is dwarfed by many other better elimination algorithms. The caffeine cache implements W-TinyLFU (a variant of the LFU+LRU algorithm). Here is a comparison of the hit rates of different algorithms:

Among them, Optimal is the most ideal hit rate, and LRU is indeed a younger brother compared with other algorithms. And our W-TinyLFU is the closest to the ideal hit rate. Of course, not only the hit rate caffeine is better than the guava cache, but also the guava cache in terms of read and write throughput.

At this time, you must be wondering why caffeine is so awesome? Don't worry, I'll tell you slowly below.

W-TinyLFU

The above has said what the traditional LFU is all about. As long as the probability distribution of data access patterns in LFU remains constant over time, its hit rate can become very high. Here I will still take iQIYI as an example. For example, a new drama comes out. We use LFU to cache it. This new drama has been accessed hundreds of millions of times in the past few days, and this visit frequency is also recorded in our LFU. hundreds of millions of times. But new dramas will always be outdated. For example, the first few episodes of this new drama a month later are actually outdated, but his number of visits is indeed too high, and other dramas cannot eliminate this new drama at all, so here There are limitations in this mode. So various LFU variants have emerged, decaying based on a time period, or frequency over a recent period of time. The same LFU also uses extra space to record the frequency of each data access, even if the data is not in the cache, it needs to be recorded, so the extra space that needs to be maintained is very large.

You can imagine that we build a hashMap for this maintenance space, and each data item will exist in this hashMap. When the amount of data is very large, this hashMap will also be very large.

Going back to LRU, our LRU is not so useless. LRU can deal with burst traffic very well, because it does not need to accumulate data frequency.

So W-TinyLFU combines LRU and LFU, as well as some features of other algorithms.

frequency record

The first thing to talk about is the problem of frequency recording. The goal we want to achieve is to use limited space to record the access frequency that changes over time. We use Count-Min Sketch in W-TinyLFU to record our visit frequency, which is also a variant of Bloom filter. As shown in the following figure: If we need to record a value, then we need to hash it through multiple hash algorithms, and then add 1 to the record of the corresponding hash algorithm. Why do we need multiple hash algorithms? Since this is a compression algorithm, conflicts are bound to occur. For example, we create an array of Long and calculate the hash position of each data. For example, Zhang San and Li Si, both of them may have the same hash value. For example, if they are both 1, the Long[1] position will increase the corresponding frequency. Zhang San visits 10,000 times, and Li Si visits once. Long[1] 1] This position is 10,001. If you take Li Si's visit rating rate, it will be 10,001. However, Li Si's name has only been visited once. In order to solve this problem, multiple The hash algorithm can be understood as a concept of long[][] two-dimensional array. For example, in the first algorithm, Zhang San and Li Si conflict, but in the second and third, there is a high probability that there is no conflict, such as a The algorithm has about a 1% probability of conflict, and the probability of the four algorithms conflicting together is 1% to the fourth power. Through this mode, when we take the visit rate of Li Si, we take the number of times that Li Si visits the lowest frequency among all the algorithms. So his name is Count-Min Sketch.

Here is a comparison with the previous one. Here is a simple example: if a hashMap records this frequency, if I have 100 data, then this HashMap has to store 100 access frequencies of this data. Even if the capacity of my cache is 1, because of Lfu's rules, I must all record the access frequency of this 100 pieces of data. If there is more data I have more records.

In Count-Min Sketch, let me directly talk about the implementation in caffeine (in the FrequencySketch class), if your cache size is 100, it will generate a long array whose size is the closest power of 2 to 100. , which is 128. And this array will record our access frequency. In caffeine, the maximum frequency of his rule is 15, the binary bit of 15 is 1111, a total of 4 bits, and the Long type is 64 bits. So each Long type can put 16 algorithms, but caffeine does not do this, only four hash algorithms are used, each Long type is divided into four segments, and each segment stores the frequencies of the four algorithms. The advantage of this is that the hash conflict can be further reduced, and the original 128-sized hash becomes 128X4.

The structure of a Long is as follows: our 4 segments are divided into A, B, C, D, which I will call them later. And the four algorithms in each segment I call him s1, s2, s3, s4. Here's an example. What should I do if I want to add a digital frequency that accesses 50? Here we use size=100 as an example.

First determine which segment the hash of 50 is in. Through hash & 3, a number less than 4 must be obtained. Assuming hash & 3=0, it is in segment A.
Use other hash algorithms to hash the 50 hash again to get the position of the long array. Suppose the s1 algorithm gets 1, the s2 algorithm gets 3, the s3 algorithm gets 4, and the s4 algorithm gets 0.
Then add +1 to the s1 position in the A segment of long[1], add 1 to 1As1 for short, then add 1 to 3As2, add 1 to 4As3, and add 1 to 0As4.

At this time, some people will question whether the maximum frequency of 15 is too small? It doesn't matter. In this algorithm, for example, the size is equal to 100. If it increases globally 1000 times, it will be globally divided by 2 to decay. After decay, it can continue to increase. This algorithm has been proved in W-TinyLFU's paper that it can be better adapted The frequency of visits for the time period.

Read and write performance

In the guava cache, we said that its read and write operations are mixed with expiration time processing, that is, you may also perform an elimination operation in a Put operation, so its read and write performance will be affected to a certain extent. You can see the above picture. , caffeine does burst the guava cache on read and write operations. Mainly because in caffeine, the operation of these events is through asynchronous operations, he submits the events to the queue, the data structure of the queue here is RingBuffer, if you are not clear, you can read this article, you should know high-performance lock-free Queue Disruptor . Then pass the default ForkJoinPool.commonPool(), or configure the thread pool by yourself, perform the queue fetch operation, and then perform subsequent elimination and expiration operations.

Of course, there are different queues for read and write. In caffeine, it is considered that cache reads are much more than writes, so for write operations, all threads share a Ringbuffer.

For read operations more frequently than write operations, to further reduce competition, each thread is equipped with a RingBuffer:

Data Retirement Policy

All data in caffeine is in ConcurrentHashMap, which is different from guava cache. Guava cache implements a structure similar to ConcurrentHashMap by itself. There are three LRU queues referenced by records in caffeine :

Eden queue: It is stipulated in caffeine that it can only be %1 of the cache capacity. If size=100, then the effective size of this queue is equal to 1. The newly arrived data is recorded in this queue to prevent burst traffic from being eliminated due to the lack of access frequency before. For example, when a new drama is launched, there is actually no access frequency at the beginning, to prevent it from being eliminated by other caches after it goes online, and joining this area. Eden District, the most comfortable and comfortable area, is difficult to be eliminated by other data here.
Probation queue: It is called the probation queue. In this queue, it means that your data is relatively cold and will be eliminated soon. The effective size is size minus eden minus protected.
Protected Queue: In this queue, you can rest assured that you will not be eliminated temporarily, but don't worry, if the Probation queue has no data or the Protected data is full, you will also face the embarrassing situation of being eliminated. Of course, if you want to become this queue, you need to access the Probation once, and it will be upgraded to a Protected queue. This effective size is (size minus eden) X 80% If size = 100, it will be 79.

The three queues are related as follows:

All new data goes into Eden.
Eden is full and eliminated into Probation.
If one of the data is accessed in the Probation, the data is upgraded to Protected.
If Protected is full, it will continue to be downgraded to Probation.

When the data is eliminated, it will be eliminated from the Probation, and the head of the data queue in this queue will be called the victim. This head of the queue must be the first to enter. According to the algorithm of the LRU queue, he should be Eliminated, but he can only be called a victim here. This queue is a probation queue, which means that he is about to be executed. Here, the tail of the queue is called the candidate, also called the attacker. Here the victim will do PK with the attacker, and the following judgments are made by the frequency data recorded in our Count-Min Sketch:

If the attacker is greater than the victim, the victim is eliminated directly.
If the attacker <= 5, then eliminate the attacker directly. This logic is explained in his note: he believes that setting a threshold for warm-up will result in a higher overall hit rate.
In other cases, random elimination.

how to use

For players who are familiar with Guava, if you are worried about switching costs, then you should worry too much. Caffeine's api draws on Guava's api, and you can find that it is basically the same.

public static void main(String[] args) {
        Cache<String, String> cache = Caffeine.newBuilder()
                .expireAfterWrite(1, TimeUnit.SECONDS)
                .expireAfterAccess(1,TimeUnit.SECONDS)
                .maximumSize(10)
                .build();
        cache.put("hello","hello");
    }

By the way, more and more open source frameworks have abandoned Guava cache, such as Spring5. In business, I myself have compared Guava cache and caffeine and finally chose caffeine, which also has good results online. So don't worry that caffeine is immature and no one uses it.

finally

This article mainly talks about iQiyi's cache road and a development history of local cache (from ancient times to the future), as well as the basic principles of each cache implementation. Of course, it is not enough to use the cache well, such as how the local cache is updated synchronously after changes in other places, distributed cache, multi-level cache and so on. A special section will be written later to introduce how to make good use of this cache. For the principles of Guava cache and caffeine, we will also take time to write the source code analysis of these two. If you are interested, you can follow the official account to check the updated article as soon as possible.

Finally, make an advertisement. If you think this article has articles for you, you can follow my technical public account. Your attention and forwarding are the greatest support for me, O(∩_∩)O

History of cache evolution you should know