cached things

Cache can be said to be ubiquitous, for example: the memory in the PC computer, the second level cache in the CPU, the cache control in the http protocol, and the CDN acceleration technology all use the idea of caching to solve performance problems.

Cache is a silver bullet for solving system performance and stability problems in high concurrency scenarios.

This article mainly discusses the issues that need to be considered in the development process of the distributed cache Redis that we often use.

1. How to decouple business logic from caching?

In most cases, we interweave the code between cache operations and business logic, such as:

public UserServiceImpl implements UserService {
    @Autowired
    private RedisTemplate<String, User> redisTemplate;
    
    @Autowired
    private UserMapper userMapper;
    
    public User getUserById(Long userId) {
        String cacheKey = "user_" + userId;
        User user = redisTemplate.opsForValue().get(cacheKey);
        if(null != user) {
            return user;
        }
        user = userMapper.getUserById(userId);
        redisTemplate.opsForValue().set(cacheKey, user); // 如果user 为null时，缓存就没有意义了
        return user;
    }
    
    public void deleteUserById(Long userId) {
        userMapper.deleteUserById(userId);
        String cacheKey = "user_" + userId;
        redisTemplate.opsForValue().del(cacheKey);
    }
}

The following problems can be seen from the above code:

The cache operation is very cumbersome and generates a lot of repetitive code;
The coupling between cache operation and business logic is very high, which is not conducive to later maintenance;
When the business data is null, it cannot be determined whether it has been cached, which will cause the cache to fail to hit;
In the development stage, in order to troubleshoot problems, it is often necessary to switch the cache function back and forth. Using the above code, it is impossible to easily switch the cache function;
When the business becomes more and more complex and the cache is used more and more places, it is difficult to locate which data to be actively deleted;
If you want to replace Redis with another caching technology, you will cry. . .

Because there are still many problems caused by high coupling, I will not list them one by one. Next, I will introduce a cache management framework open sourced by the author: How AutoLoadCache helps us solve the above problems.

Drawing on the idea of Spring cache, AOP + Annotation and other technologies are used to realize the decoupling of cache and business logic. Let's refactor the above code with AutoLoadCache for comparison:

public interface UserMapper {
    @Cache(expire = 120, key = "'user_' + #args[0]")
    User getUserById(Long userId);
    
    @CacheDelete({ @CacheDeleteKey(value = "'user' + #args[0].id") })
    void updateUser(User user);
}

public UserServiceImpl implements UserService {
    
    @Autowired
    private UserMapper userMapper;
    
    public User getUserById(Long userId) {
        return userMapper.getUserById(userId);
    }
    @Transactional(rollbackFor=Throwable.class)
    public void updateUser(User user) {
        userMapper.updateUser(user);
    }
}

2. How to improve the performance of cached key generation expressions?

After using Annotation to solve the coupling between the cache and the business, our main task is to design the cache key. The smaller the granularity of the cache key design, the better the reusability of the cache.

In the above example, we use Spring EL expressions to generate cache keys. Some people may worry about the poor performance of Spring EL expressions, or what to do if they don't want to use Spring?

In order to meet these requirements, the framework supports extended expression parser: you can extend it after inheriting com.jarvis.cache.script. AbstractScriptParser.

The framework now supports Ognl, javascript expressions in addition to Spring EL expressions. For people with very high performance requirements, Ognl can be used, which has performance very close to native code.

3. How to solve the cache key conflict problem?

In actual situations, there may be multiple modules sharing a Redis server or a Redis cluster, which may cause cache key conflicts.

In order to solve this problem AutoLoadCache, increased namespace. If the namespace is set, the namespace will be added to the front of each cache key:

public final class CacheKeyTO implements Serializable {

    private final String namespace;

    private final String key;// 缓存Key

    private final String hfield;// 设置哈希表中的字段，如果设置此项，则用哈希表进行存储

    public String getCacheKey() { // 生成缓存Key方法
        if(null != this.namespace && this.namespace.length() > 0) {
            return new StringBuilder(this.namespace).append(":").append(this.key).toString();
        }
        return this.key;
    }
}

4. Compress cache data and improve serialization and deserialization performance

We hope that the smaller the cached packets, the better, to reduce memory usage and bandwidth pressure; at the same time, the performance of serialization and deserialization should also be considered.

In order to meet the needs of different users, AutoLoadCache has implemented serialization and deserialization tools based on JDK, Hessian, JacksonJson, Fastjson, JacksonMsgpack and other technologies. It can also be extended by implementing the com.jarvis.cache.serializer.ISerializer interface.

The data packets generated by the serialization and deserialization tools that come with JDK are very large, and the performance is also very poor. It is not recommended for everyone to use them; JacksonJson and Fastjson are based on JSON, and the parameters and return values of all functions that use the cache must be It is a specific type, not an indeterminate type (not Object, List<?>, etc.). In addition, some data are converted into Json, and some of their properties will be ignored. In this case, Json cannot be used; Hessian is a very good choice, the technology is very mature, and the stability is very good. Both Ali's dubbo and HSF RPC frameworks use Hessian for serialization and deserialization.

5. How to reduce the number of concurrent back-to-source?

When the cache misses, it is necessary to return to the data source to fetch data. If there are 100 concurrent requests for the same data, these 100 requests go to the data source to fetch data and write to the cache at the same time, resulting in a huge waste of resources. , it may also cause the data source to be overloaded and unable to serve.

AutoLoadCache has two mechanisms to solve this problem:

Bringing Mechanism

The main exchange mechanism refers to that when multiple users request the same data, one user will be elected to load the data from the data source, and other users will wait for the data they get.
autoloading mechanism

The automatic loading mechanism puts information such as user requests and cache time into a queue. The background uses the thread pool to periodically scan the queue. If the cache is about to expire, it will go to the data source to load the latest data and put it into the cache. In this way, the user's unpredictable concurrent requests can be converted into a fixed number of requests.

The autoloading mechanism was originally designed to solve the following problems:
1. Very frequently used data is cached in memory for a long time;
2. Solve time-consuming business;

The performance of writing data to the cache is relatively slower than that of reading requests. Therefore, the above two mechanisms can also reduce the concurrency of write caches and improve the performance and throughput of cache services.

6. Asynchronous refresh

When the cache expires, the request penetrates into the data source, which may cause system instability.

AutoLoadCache will initiate an asynchronous request to load data from the data source before the cache is about to expire to reduce the risk in this regard.

7. Batch delete cache

In many cases, the data query conditions are complex, and we cannot obtain or restore the cache key to be deleted.

AutoLoadCache In order to solve this problem, Redis's hash table is used to manage this part of the cache. Put the caches that need to be deleted in batches in the same hash table. If you need to delete these caches in batches, just delete the hash table directly. In this case, you only need to design cache keys with reasonable granularity.

Set the key of the hash table through the hfield of @Cache.

Let's take a product review scenario:

public interface ProuductCommentMapper {
    @Cache(expire=600, key="'prouduct_comment_list_'+#args[0]", hfield = "#args[1]+'_'+#args[2]")
    // 例如：prouductId=1, pageNo=2, pageSize=3 时相当于Redis命令：HSET prouduct_comment_list_1 2_3  List<Long>
    public List<Long> getCommentListByProuductId(Long prouductId, int pageNo, int pageSize);
        
    @CacheDelete({@CacheDeleteKey(value="'prouduct_comment_list_'+#args[0].prouductId")}) 
    // 例如：#args[0].prouductId = 1时，相当于Redis命令: DEL prouduct_comment_list_1
    public void addComment(ProuductComment comment) ;
    
}

If a comment is added, we only need to actively delete the comments on the first 3 pages:

public interface ProuductCommentMapper {
    @Cache(expire=600, key="'prouduct_comment_list_'+#args[0]+'_'+#args[1]", hfield = "#args[2]")
    public List<Long> getCommentListByProuductId(Long prouductId, int pageNo, int pageSize);
        
    @CacheDelete({
        @CacheDeleteKey(value="'prouduct_comment_list_'+#args[0].prouductId+'_1'"),
        @CacheDeleteKey(value="'prouduct_comment_list_'+#args[0].prouductId+'_2'"),
        @CacheDeleteKey(value="'prouduct_comment_list_'+#args[0].prouductId+'_3'")
    }) 
    public void addComment(ProuductComment comment) ;
    
}

8. Double write inconsistency

First look at the following code:

public interface UserMapper {
    @Cache(expire = 120, key = "'user_' + #args[0]")
    User getUserById(Long userId);
    
    @CacheDelete({ @CacheDeleteKey(value = "'user' + #args[0].id") })
    void updateUser(User user);
}

public UserServiceImpl implements UserService {
    
    @Autowired
    private UserMapper userMapper;
    
    public User getUserById(Long userId) {
        return userMapper.getUserById(userId);
    }
    @Transactional(rollbackFor=Throwable.class)
    public void updateUser(User user) {
        userMapper.updateUser(user); 
    }
}

When using the updateUser method to update user information, the data in the cache will be actively deleted at the same time. If there is another request to load user data before the transaction is committed, the old data in the database will be cached. During the period before the next time the cache is actively deleted or the cache expires, the data in the cache will be the same as the data in the database. data is inconsistent. In order to solve this problem, the AutoloadCache framework introduces a new annotation: @CacheDeleteTransactional:

public UserServiceImpl implements UserService {
    
    @Autowired
    private UserMapper userMapper;
    
    public User getUserById(Long userId) {
        return userMapper.getUserById(userId);
    }
    @Transactional(rollbackFor=Throwable.class)
    @CacheDeleteTransactional
    public void updateUser(User user) {
        userMapper.updateUser(user); 
    }
}

After using the @CacheDeleteTransactional annotation, AutoloadCache will first use the ThreadLocal cache to delete the cache key, and then perform the cache delete operation after the transaction is committed. In fact, it cannot be said to "solve the inconsistency problem", but to alleviate it.

The problem of double-write inconsistency of cached data is difficult to solve. Even if we only use the database (single write), there will be data inconsistency (when the data is fetched from the database and updated at the same time), we can only is to reduce the occurrence of inconsistencies. For some more important data, we cannot directly use the data in the cache for calculation and write back to the database, such as deducting inventory, we need to add version information to the data, and use techniques such as optimistic locking to avoid data inconsistency.

9. Support multiple cache operations

In most cases, we read and write to the cache, but sometimes, we only need to read data from the cache, or only write data, then we can specify the type of cache operation through the opType of @Cache. The following operation types are now supported:

READ_WRITE: read and write cache operations: if there is data in the cache, use the data in the cache, if there is no data in the cache, load the data and write to the cache. The default is READ_WRITE;
WRITE: Load the latest data from the data source and write to the cache. Synchronize data sources and cached data;
READ_ONLY: Only read from the cache, and will not load data from the data source. Scenarios for reading and writing caches in different places;
LOAD : Only loads data from the data source, does not read the data in the cache, and does not write to the cache.

In addition, you can only statically refer to the cache operation type in @Cache. If you want to adjust the operation type at runtime, you need to adjust it through the CacheHelper.setCacheOpType () method.

Finally, welcome to github to support the AutoLoadCache open source projects Star and Fork.