Are you still using Guava Cache? It is the king of Java local caching

Recommended reading:

When it comes to local caching, everyone can think of Guava Cache. Its advantages are that it encapsulates get and put operations; provides thread-safe caching operations; provides expiration strategies; provides recycling strategies; cache monitoring. When the cached data exceeds the maximum value, the LRU algorithm is used instead. In this article, we will talk about a new local caching framework: Caffeine Cache. It is also standing on the shoulders of giants-Guava Cache, which optimizes the algorithm development based on his ideas.

This article mainly introduces the use of Caffine Cache and the use of Caffine Cache in SpringBoot.

The advantages of Caffine Cache algorithm-W-TinyLFU

Speaking of optimization, what exactly has Caffine Cache optimized? We just mentioned LRU, the common cache elimination algorithms include FIFO, LFU:

  • FIFO: First in, first out. In this elimination algorithm, the one that enters the cache first will be eliminated first, which will result in a very low hit rate.
  • LRU: The least recently used algorithm. Every time we access data, we will put it at the end of our team. If we need to eliminate data, we only need to eliminate the head of the team. There is still a problem. If a piece of data is accessed 1,000 times in 1 minute, and then this data is not accessed in the next 1 minute, but there are other data accesses, our hot data will be eliminated.
  • LFU: The least frequent use recently, use extra space to record the frequency of each data use, and then select the lowest frequency for elimination. This avoids the problem that LRU cannot handle the time period.

The above three strategies have their own advantages and disadvantages. The cost of implementation is higher than one, and the hit rate is also better than one. Although Guava Cache has so many functions, it still encapsulates LRU in essence. If there is a better algorithm, and it can also provide so many functions, it will pale in comparison.

"Limitations of LFU" : In LFU, as long as the probability distribution of data access patterns remains unchanged over time, the hit rate can become very high. For example, when a new drama comes out, we use LFU to cache it for him. This new drama has been accessed hundreds of millions of times in the past few days, and this access frequency is also recorded in our LFU hundreds of millions of times. But new dramas will always be outdated. For example, after a month, the first few episodes of this new show are actually outdated, but his traffic is indeed too high. Other TV shows can't eliminate this new show at all, so here This model has limitations.

"Advantages and Limitations of LRU " : LRU can deal with sudden traffic situations very well because it does not need to accumulate data frequency. But LRU predicts the future through historical data is limited, it will consider the last data to be the most likely to be accessed again, and give it the highest priority.

Under the limitations of existing algorithms, the hit rate of cached data will be more or less damaged, and the hit rate is an important indicator of the cache. The HighScalability website published an article, W-TinyLFU-a modern cache, invented by a former Google engineer. Caffine Cache is developed based on this algorithm. Caffeine provides a near-optimal hit rate due to the Window TinyLfu recycling strategy.

When the data access pattern does not change over time, the LFU strategy can bring the best cache hit rate. However, LFU has two disadvantages:

  • First of all, it needs to maintain frequency information for each record, and it needs to be updated every time it is accessed, which is a huge overhead;
  • Secondly, if the data access pattern changes over time, the frequency information of LFU cannot be changed accordingly. Therefore, the records frequently accessed earlier may occupy the cache, while the records that are accessed more later cannot be hit.

Therefore, most cache designs are based on LRU or its variants. In contrast, LRU does not need to maintain expensive cache record meta-information, and it can also reflect data access patterns that change over time. However, under many loads, LRU still needs more space to achieve a cache hit rate consistent with LFU. Therefore, a "modern" cache should be able to combine the strengths of both.

TinyLFU maintains the frequency information of recent access records. As a filter, when new records come, only records that meet the requirements of TinyLFU can be inserted into the cache. As mentioned earlier, as a modern cache, it needs to solve two challenges:

  • One is how to avoid the high cost of maintaining frequency information;
  • The other is how to reflect access patterns that change over time.

First look at the former. TinyLFU uses data stream Sketching technology. Count-Min Sketch is obviously an effective means to solve this problem. It can store frequency information in a much smaller space while ensuring a very low False Positive Rate. But considering the second question, it will be much more complicated, because we know that it is difficult for any Sketching data structure to reflect time changes. In Bloom Filter, we can have Timing Bloom Filter, but for CMSketch In terms of how to do Timing CMSketch is not so easy. TinyLFU uses a sliding window-based time decay design mechanism, with the help of a simple reset operation: every time a record is added to Sketch, a counter will be incremented by 1, and when the counter reaches a size W, Divide all the recorded Sketch values ​​by 2, the reset operation can play a role of attenuation.

W-TinyLFU is mainly used to solve some sparse burst access elements. In some scenes with a small number but a large amount of burst visits, TinyLFU will not be able to save such elements because they cannot accumulate high enough frequency in a given time. Therefore, W-TinyLFU is a combination of LFU and LRU. The former is used to deal with most scenarios, while LRU is used to handle burst traffic.

In the scheme of processing frequency records, you may think of using hashMap to store, each key corresponds to a frequency value. Then if the amount of data is particularly large, will this hashMap also be particularly large? From this, we can think of Bloom Filter. For each key, use n bytes to store a flag to determine whether the key is in the set. The principle is to use k hash functions to hash the key into an integer.

Use Count-Min Sketch in W-TinyLFU to record our visit frequency, and this is also a variant of Bloom filter. As shown below:

If we need to record a value, we need to hash it through multiple hash algorithms, and then add +1 to the record of the corresponding hash algorithm. Why do we need multiple hash algorithms? Since this is a compression algorithm, conflicts are bound to occur. For example, we create a byte array and calculate the hash position of each data. For example, Zhang San and Li Si, both of them may have the same hash value. For example, if both are 1, the byte[1] position will increase the corresponding frequency. Zhang San visits 10,000 times, and Li Si visits 1 byte[ 1] This location is 10:1. If you take Li Si's access frequency, it will be taken out as 101, but Li Si Mingming only visited once. In order to solve this problem, multiple hashes are used. The algorithm can be understood as a concept of long[][] two-dimensional array. For example, in the first algorithm, Zhang San and Li Si conflict, but in the second and third algorithms, there is a high probability that they do not conflict, such as an algorithm There is about 1% probability of conflict, and the probability that the four algorithms conflict together is 1% to the fourth power. Through this model, when we take Li Si's access rate, we take the number of times that Li Si has the lowest frequency among all the algorithms. So his name is Count-Min Sketch.

use

The current latest version is:

<dependency>
    <groupId>com.github.ben-manes.caffeine</groupId>
    <artifactId>caffeine</artifactId>
    <version>2.6.2</version>
</dependency>

2.1 Cache filling strategy

Caffeine Cache provides three cache filling strategies: manual, synchronous loading and asynchronous loading.

「1. Manual loading」

Specify a synchronized function every time you get the key. If the key does not exist, call this function to generate a value.

/**
     * 手动加载
     * @param key
     * @return
     */
public Object manulOperator(String key) {
    Cache<String, Object> cache = Caffeine.newBuilder()
        .expireAfterWrite(1, TimeUnit.SECONDS)
        .expireAfterAccess(1, TimeUnit.SECONDS)
        .maximumSize(10)
        .build();
    //如果一个key不存在,那么会进入指定的函数生成value
    Object value = cache.get(key, t -> setValue(key).apply(key));
    cache.put("hello",value);

    //判断是否存在如果不存返回null
    Object ifPresent = cache.getIfPresent(key);
    //移除一个key
    cache.invalidate(key);
    return value;
}

public Function<String, Object> setValue(String key){
    return t -> key + "value";
}

「2. Synchronous loading」

When constructing Cache, the build method passes in a CacheLoader implementation class. Implement the load method and load the value through the key.

/**
     * 同步加载
     * @param key
     * @return
     */
public Object syncOperator(String key){
    LoadingCache<String, Object> cache = Caffeine.newBuilder()
        .maximumSize(100)
        .expireAfterWrite(1, TimeUnit.MINUTES)
        .build(k -> setValue(key).apply(key));
    return cache.get(key);
}

public Function<String, Object> setValue(String key){
    return t -> key + "value";
}

"3. Asynchronous loading"

AsyncLoadingCache is inherited from LoadingCache class. Asynchronous loading uses Executor to call methods and return a CompletableFuture. The asynchronous loading cache uses a reactive programming model.

If you want to call synchronously, you should provide CacheLoader. To express asynchronously, an AsyncCacheLoader should be provided and a CompletableFuture should be returned.

/**
     * 异步加载
     *
     * @param key
     * @return
     */
public Object asyncOperator(String key){
    AsyncLoadingCache<String, Object> cache = Caffeine.newBuilder()
        .maximumSize(100)
        .expireAfterWrite(1, TimeUnit.MINUTES)
        .buildAsync(k -> setAsyncValue(key).get());

    return cache.get(key);
}

public CompletableFuture<Object> setAsyncValue(String key){
    return CompletableFuture.supplyAsync(() -> {
        return key + "value";
    });
}

2.2 Recycling strategy

Caffeine provides three recycling strategies: size-based recycling, time-based recycling, and reference-based recycling.

"1. Expiration method based on size"

There are two ways of size-based recycling strategies: one is based on cache size, and the other is based on weight.

// 根据缓存的计数进行驱逐
LoadingCache<String, Object> cache = Caffeine.newBuilder()
    .maximumSize(10000)
    .build(key -> function(key));

// 根据缓存的权重来进行驱逐(权重只是用于确定缓存大小,不会用于决定该缓存是否被驱逐)
LoadingCache<String, Object> cache1 = Caffeine.newBuilder()
    .maximumWeight(10000)
    .weigher(key -> function1(key))
    .build(key -> function(key));
maximumWeight与maximumSize不可以同时使用。

"2. Time-based expiration method"

// 基于固定的到期策略进行退出
LoadingCache<String, Object> cache = Caffeine.newBuilder()
    .expireAfterAccess(5, TimeUnit.MINUTES)
    .build(key -> function(key));
LoadingCache<String, Object> cache1 = Caffeine.newBuilder()
    .expireAfterWrite(10, TimeUnit.MINUTES)
    .build(key -> function(key));

// 基于不同的到期策略进行退出
LoadingCache<String, Object> cache2 = Caffeine.newBuilder()
    .expireAfter(new Expiry<String, Object>() {
        @Override
        public long expireAfterCreate(String key, Object value, long currentTime) {
            return TimeUnit.SECONDS.toNanos(seconds);
        }

        @Override
        public long expireAfterUpdate(@Nonnull String s, @Nonnull Object o, long l, long l1) {
            return 0;
        }

        @Override
        public long expireAfterRead(@Nonnull String s, @Nonnull Object o, long l, long l1) {
            return 0;
        }
    }).build(key -> function(key));

Caffeine provides three timing eviction strategies:

  • expireAfterAccess(long, TimeUnit): Start timing after the last access or write, and expire after the specified time. If there are always requests to access the key, the cache will never expire.
  • expireAfterWrite(long, TimeUnit): Start timing after the last write to the cache and expire after the specified time.
  • expireAfter(Expiry): Self-defined strategy, expiry time is calculated independently by Expiry.

The cache deletion strategy uses lazy deletion and regular deletion. The time complexity of these two deletion strategies are both O(1).

"3. Citation-based expiration method"

Four reference types in Java

// 当key和value都没有引用时驱逐缓存
LoadingCache<String, Object> cache = Caffeine.newBuilder()
    .weakKeys()
    .weakValues()
    .build(key -> function(key));

// 当垃圾收集器需要释放内存时驱逐
LoadingCache<String, Object> cache1 = Caffeine.newBuilder()
    .softValues()
    .build(key -> function(key));

Note: AsyncLoadingCache does not support weak references and soft references.

  • Caffeine.weakKeys(): Use weak references to store keys. If there is no strong reference to the key elsewhere, the cache will be recycled by the garbage collector. Since the garbage collector only relies on identity equality, this will cause the entire cache to use identity (==) equality to compare keys instead of equals().

  • Caffeine.weakValues(): Use weak references to store values. If there is no strong reference to the value elsewhere, the cache will be reclaimed by the garbage collector. Since the garbage collector only relies on identity equality, this will cause the entire cache to use identity (==) equality to compare keys instead of equals().

  • Caffeine.softValues(): Use soft references to store values. When the memory is full, the soft-referenced object will be garbage collected in a least-recently-used way. Since the use of soft references needs to wait until the memory is full before reclaiming, we usually recommend configuring the cache with a maximum memory usage. softValues() will use identities (==) instead of equals() to compare values.

Caffeine.weakValues() and Caffeine.softValues() cannot be used together.

"3. Remove event listener"

Cache<String, Object> cache = Caffeine.newBuilder()
    .removalListener((String key, Object value, RemovalCause cause) ->
                     System.out.printf("Key %s was removed (%s)%n", key, cause))
    .build();

"4. Write to external storage"

The CacheWriter method can write all the data in the cache to a third party.

LoadingCache<String, Object> cache2 = Caffeine.newBuilder()
    .writer(new CacheWriter<String, Object>() {
        @Override public void write(String key, Object value) {
            // 写入到外部存储
        }
        @Override public void delete(String key, Object value, RemovalCause cause) {
            // 删除外部存储
        }
    })
    .build(key -> function(key));

If you have multiple levels of cache, this method is still very practical.

Note: CacheWriter cannot be used with weak keys or AsyncLoadingCache.

"5. Statistics"

The same statistics as Guava Cache.

Cache<String, Object> cache = Caffeine.newBuilder()
    .maximumSize(10_000)
    .recordStats()
    .build();

By using Caffeine.recordStats(), it can be transformed into a collection of statistics. A CacheStats is returned by Cache.stats(). CacheStats provides the following statistical methods:

  • hitRate(): Returns the cache hit rate
  • evictionCount(): the number of cache collections
  • averageLoadPenalty(): average time to load new values

Default Cache-Caffine Cache in SpringBoot

The default local cache in SpringBoot 1.x version is Guava Cache. Caffine Cache has replaced Guava Cache in version 2.x (Spring Boot 2.0 (spring 5)). After all, there is a better cache elimination strategy.

Let's talk about how to use cache in SpringBoot2.x version.

Introduce dependencies:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-cache</artifactId>
</dependency>
<dependency>
    <groupId>com.github.ben-manes.caffeine</groupId>
    <artifactId>caffeine</artifactId>
    <version>2.6.2</version>
</dependency>

Add comments to enable caching support

Add @EnableCaching annotation:

@SpringBootApplication
@EnableCaching
public class SingleDatabaseApplication {

    public static void main(String[] args) {
        SpringApplication.run(SingleDatabaseApplication.class, args);
    }
}

Inject relevant parameters in configuration files

properties file

spring.cache.cache-names=cache1
spring.cache.caffeine.spec=initialCapacity=50,maximumSize=500,expireAfterWrite=10s

Or Yaml file

spring:
  cache:
    type: caffeine
    cache-names:
    - userCache
    caffeine:
      spec: maximumSize=1024,refreshAfterWrite=60s

If you use the refreshAfterWrite configuration, you must specify a CacheLoader. Without this configuration, you don’t need this bean. As mentioned above, the CacheLoader will be associated with all caches managed by the cache manager, so it must be defined as CacheLoader<Object, Object>, and the automatic configuration will Ignore all generic types.

import com.github.benmanes.caffeine.cache.CacheLoader;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * @author: rickiyang
 * @date: 2019/6/15
 * @description:
 */
@Configuration
public class CacheConfig {

    /**
     * 相当于在构建LoadingCache对象的时候 build()方法中指定过期之后的加载策略方法
     * 必须要指定这个Bean,refreshAfterWrite=60s属性才生效
     * @return
     */
    @Bean
    public CacheLoader<String, Object> cacheLoader() {
        CacheLoader<String, Object> cacheLoader = new CacheLoader<String, Object>() {
            @Override
            public Object load(String key) throws Exception {
                return null;
            }
            // 重写这个方法将oldValue值返回回去,进而刷新缓存
            @Override
            public Object reload(String key, Object oldValue) throws Exception {
                return oldValue;
            }
        };
        return cacheLoader;
    }
}

Caffeine commonly used configuration instructions:

  • initialCapacity=[integer]: initial cache space size
  • maximumSize=[long]: Maximum number of cache
  • maximumWeight=[long]: the maximum weight of the cache
  • expireAfterAccess=[duration]: expire after a fixed time after the last write or access
  • expireAfterWrite=[duration]: expire after a fixed time after the last write
  • refreshAfterWrite=[duration]: refresh the cache after a fixed time interval has elapsed since the cache was created or the last update of the cache
  • weakKeys: open weak references to keys
  • weakValues: open weak references to value
  • softValues: open soft references of value
  • recordStats: develop statistical functions

note:

  • When expireAfterWrite and expireAfterAccess exist at the same time, expireAfterWrite shall prevail.
  • MaximumSize and maximumWeight cannot be used at the same time
  • weakValues ​​and softValues ​​cannot be used at the same time

It should be noted that the use of configuration files to configure cache items can generally meet the usage requirements, but the flexibility is not very high. If we have many cache items, writing it will lead to a long configuration file. So in general, you can also choose to use beans to initialize Cache instances.

The following demonstration uses bean to inject:

package com.rickiyang.learn.cache;

import com.github.benmanes.caffeine.cache.CacheLoader;
import com.github.benmanes.caffeine.cache.Caffeine;
import org.apache.commons.compress.utils.Lists;
import org.springframework.cache.CacheManager;
import org.springframework.cache.caffeine.CaffeineCache;
import org.springframework.cache.support.SimpleCacheManager;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Primary;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;

/**
 * @author: rickiyang
 * @date: 2019/6/15
 * @description:
 */
@Configuration
public class CacheConfig {

    /**
     * 创建基于Caffeine的Cache Manager
     * 初始化一些key存入
     * @return
     */
    @Bean
    @Primary
    public CacheManager caffeineCacheManager() {
        SimpleCacheManager cacheManager = new SimpleCacheManager();
        ArrayList<CaffeineCache> caches = Lists.newArrayList();
        List<CacheBean> list = setCacheBean();
        for(CacheBean cacheBean : list){
            caches.add(new CaffeineCache(cacheBean.getKey(),
                    Caffeine.newBuilder().recordStats()
                            .expireAfterWrite(cacheBean.getTtl(), TimeUnit.SECONDS)
                            .maximumSize(cacheBean.getMaximumSize())
                            .build()));
        }
        cacheManager.setCaches(caches);
        return cacheManager;
    }

    /**
     * 初始化一些缓存的 key
     * @return
     */
    private List<CacheBean> setCacheBean(){
        List<CacheBean> list = Lists.newArrayList();
        CacheBean userCache = new CacheBean();
        userCache.setKey("userCache");
        userCache.setTtl(60);
        userCache.setMaximumSize(10000);

        CacheBean deptCache = new CacheBean();
        deptCache.setKey("userCache");
        deptCache.setTtl(60);
        deptCache.setMaximumSize(10000);

        list.add(userCache);
        list.add(deptCache);

        return list;
    }

    class CacheBean {
        private String key;
        private long ttl;
        private long maximumSize;

        public String getKey() {
            return key;
        }

        public void setKey(String key) {
            this.key = key;
        }

        public long getTtl() {
            return ttl;
        }

        public void setTtl(long ttl) {
            this.ttl = ttl;
        }

        public long getMaximumSize() {
            return maximumSize;
        }

        public void setMaximumSize(long maximumSize) {
            this.maximumSize = maximumSize;
        }
    }

}

Created a SimpleCacheManager as the management object of Cache, and then initialized two Cache objects to store user and dept type caches respectively. Of course, the parameter settings for constructing Cache are relatively simple. You can configure the parameters according to your needs when you use it.

Use annotations to add, delete, modify and check the cache

We can use the @Cacheable, @CachePut, @CacheEvict and other annotations provided by spring to conveniently use caffeine caching.

If multiple cahce are used, such as redis, caffeine, etc., a certain CacheManage must be specified as @primary, and if no cacheManager is specified in the @Cacheable annotation, the one marked as primary is used.

There are mainly 5 cache annotations:

  • @Cacheable triggers the cache entry (this is generally placed on the method of creation and acquisition, the @Cacheable annotation will first query whether there is a cache, if there is a cache, it will use the cache, if not, the method will be executed and cached)
  • @CacheEvict triggers the cached eviction (the method used for deletion)
  • @CachePut updates the cache and does not affect method execution (on the method used for modification, the method under this annotation will always be executed)
  • @Caching combines multiple caches on a method (this annotation can allow a method to set multiple annotations at the same time)
  • @CacheConfig Set some common cache-related configurations at the class level (used in conjunction with other caches)

Talk about the difference between @Cacheable and @CachePut:

  • @Cacheable: Whether its annotated method is executed depends on the conditions in Cacheable, and the method may not be executed in many cases.

  • @CachePut: This annotation will not affect the execution of the method, which means that no matter what conditions it configures, the method will be executed, and more often it is used for modification.

Briefly talk about the use of each method in the Cacheable class:

public @interface Cacheable {

    /**
     * 要使用的cache的名字
     */
    @AliasFor("cacheNames")
    String[] value() default {};

    /**
     * 同value(),决定要使用那个/些缓存
     */
    @AliasFor("value")
    String[] cacheNames() default {};

    /**
     * 使用SpEL表达式来设定缓存的key,如果不设置默认方法上所有参数都会作为key的一部分
     */
    String key() default "";

    /**
     * 用来生成key,与key()不可以共用
     */
    String keyGenerator() default "";

    /**
     * 设定要使用的cacheManager,必须先设置好cacheManager的bean,这是使用该bean的名字
     */
    String cacheManager() default "";

    /**
     * 使用cacheResolver来设定使用的缓存,用法同cacheManager,但是与cacheManager不可以同时使用
     */
    String cacheResolver() default "";

    /**
     * 使用SpEL表达式设定出发缓存的条件,在方法执行前生效
     */
    String condition() default "";

    /**
     * 使用SpEL设置出发缓存的条件,这里是方法执行完生效,所以条件中可以有方法执行后的value
     */
    String unless() default "";

    /**
     * 用于同步的,在缓存失效(过期不存在等各种原因)的时候,如果多个线程同时访问被标注的方法
     * 则只允许一个线程通过去执行方法
     */
    boolean sync() default false;

}

Annotation-based usage:

package com.rickiyang.learn.cache;

import com.rickiyang.learn.entity.User;
import org.springframework.cache.annotation.CacheEvict;
import org.springframework.cache.annotation.CachePut;
import org.springframework.cache.annotation.Cacheable;
import org.springframework.stereotype.Service;

/**
 * @author: rickiyang
 * @date: 2019/6/15
 * @description: 本地cache
 */
@Service
public class UserCacheService {

    /**
     * 查找
     * 先查缓存,如果查不到,会查数据库并存入缓存
     * @param id
     */
    @Cacheable(value = "userCache", key = "#id", sync = true)
    public void getUser(long id){
        //查找数据库
    }

    /**
     * 更新/保存
     * @param user
     */
    @CachePut(value = "userCache", key = "#user.id")
    public void saveUser(User user){
        //todo 保存数据库
    }

    /**
     * 删除
     * @param user
     */
    @CacheEvict(value = "userCache",key = "#user.id")
    public void delUser(User user){
        //todo 保存数据库
    }
}

If you don't want to use annotations to manipulate the cache, you can also directly use SimpleCacheManager to get the cache key and operate.

Note that the above key uses spEL expressions. Spring Cache provides some SpEL context data for us to use. The following table is directly taken from Spring official documents:

note:

1. When we want to use the properties of the root object as the key, we can also omit "#root", because Spring uses the properties of the root object by default. Such as

@Cacheable(key = "targetClass + methodName +#p0")

2. When using method parameters, we can directly use "# parameter name" or "#p parameter index". Such as:

@Cacheable(value="userCache", key="#id")
@Cacheable(value="userCache", key="#p0")

SpEL provides a variety of operators

Guess you like

Origin blog.csdn.net/weixin_45784983/article/details/108538000
Recommended