How do I use cache in the project, and give solutions to cache avalanche, breakdown, penetration, and data consistency

Redis

Preface

Due to the previous stress test, the throughput of analyzing and obtaining homepage information is very low. For this, we optimized the logic, changed multiple queries to the database to one query, and then pieced together the data we want in the java logic , And then conducted a stress test. Although the throughput has improved, it is still not satisfied. Later, the database field is indexed and the throughput has also improved, but the change is not big. If you still want to optimize, you can Optimize caching. Basically, things on the homepage are read more and write less. In order to cater to this business scenario, you can use a caching solution.

Put part of the data in the cache to speed up the access, and the DB is responsible for the data placement

First, we need to consider what data needs to be placed in the cache:

  • The requirements for immediacy and data consistency are not high, such as logistics, product classification, product list, etc., which are suitable for caching and an expiration time (according to the frequency of data update)
  • Data with a large number of visits and low update frequency, which is what we often call the scenario of more reading and less writing. For example, it is acceptable for the buyer to see the news in 5 minutes when the product is released in the background.

For data that requires high immediacy, high data consistency, or frequently updated data, check it out in the database!

Add cache logic

Let's first sort out the logic of adding cache

  • First of all, what are we adding to the cache? If the entire project is implemented in java, then we can directly use jdk serialization and store it in redis, but in a large project, we have to consider various issues such as cross-platform compatibility and cross-language, so we use json string Stored in the form of, because json is cross-language and cross-platform compatible
  • The logic of saving is to first convert the object into a json string and store it in redis. The logic of fetching is to reverse the information obtained from redis. This is the process of serialization and deserialization.

There is an episode when using redis

When completing the basic logic, I conducted a stress test, and an off-heap memory overflow exception occurred. The specific reasons are as follows:

  • After SpringBoot 2.0, lettuce is used as the redis client by default, and it uses netty for network communication at the bottom.
  • The lettuce bug caused off-heap memory overflow. When I increased the jvm startup parameter -Xmx, I found that the problem was still not solved. Sooner or later, an exception would still occur. That can be set through -Dio.netty.maxDirectMemory, but you will find that exceptions will still appear. Its role is to increase the memory, not from the root.
  • Solution: (1), upgrade Lettuce client (2) switch to Jedis; I used the second solution

So far, do you think that caching is enough? The answer is of course no, in the case of high concurrency, if only such an operation, it will bring a series of problems!

For example: cache penetration, cache avalanche, cache breakdown, let me talk about how I solved it in the project

Common caching problems and my solutions in the project

Let me explain the concept first:

  • Cache penetration: It is to query a data that is not in the cache and that is not in the database. It is maliciously attacked by criminals. It is to query a data that does not exist. Suddenly it will send hundreds of thousands of requests to the database. Collapsed
  • Cache avalanche: This is for a batch of keys in the cache that expire at the same time, and hundreds of thousands of concurrent requests come to request these data, then the request will be sent to the database, causing the database to crash, which is Avalanche effect
  • Cache breakdown: The key for a certain extreme hot spot in the cache expires at a certain moment. This is a concurrent request of hundreds of thousands of calls to the database, resulting in database downtime

solution:

  • Cache penetration: (The solution I took is to cache a null value and set a short expiration time)
    • By caching a null value and adding an expiration time
    • Through the Bloom filter, the data that does not exist at all is blocked, but this scheme will have certain misjudgments
  • Cache avalanche:
    • To deal with a large number of keys expiring at the same time, we can add a random value when setting the expiration time to deal with
  • Cache breakdown:
    • This is achieved by locking. When a large number of requests come in, the method of locking is used to let a certain thread go to the database to check, and then put the detected data into the cache

Take a look at my code:

//去数据库中查的业务逻辑
private Map<String, List<Catelog2Vo>> getDataFromDb() {
    
    
    //得到锁以后,我们应该再去缓存中确定一次,如果没有才需要继续查询
    String catalogJson = stringRedisTemplate.opsForValue().get("catalogJson");
    if (!StringUtils.isEmpty(catalogJson)) {
    
    
        //缓存不为空直接返回
        Map<String, List<Catelog2Vo>> result = JSON.parseObject(catalogJson, new TypeReference<Map<String, List<Catelog2Vo>>>() {
    
    
        });
        return result;
    }

    System.out.println("查询了数据库");

    /**
         * 将数据库的多次查询变为一次
         */
    List<CategoryEntity> selectList = this.baseMapper.selectList(null);

    //1、查出所有分类
    //1、1)查出所有一级分类
    List<CategoryEntity> level1Categorys = getParent_cid(selectList, 0L);

    //封装数据
    Map<String, List<Catelog2Vo>> parentCid = level1Categorys.stream().collect(Collectors.toMap(k -> k.getCatId().toString(), v -> {
    
    
        //1、每一个的一级分类,查到这个一级分类的二级分类
        List<CategoryEntity> categoryEntities = getParent_cid(selectList, v.getCatId());
        //2、封装上面的结果
        List<Catelog2Vo> catelog2Vos = null;
        if (categoryEntities != null) {
    
    
            catelog2Vos = categoryEntities.stream().map(l2 -> {
    
    
                Catelog2Vo catelog2Vo = new Catelog2Vo(v.getCatId().toString(), null, l2.getCatId().toString(), l2.getName().toString());

                //1、找当前二级分类的三级分类封装成vo
                List<CategoryEntity> level3Catelog = getParent_cid(selectList, l2.getCatId());

                if (level3Catelog != null) {
    
    
                    List<Catelog2Vo.Category3Vo> category3Vos = level3Catelog.stream().map(l3 -> {
    
    
                        //2、封装成指定格式
                        Catelog2Vo.Category3Vo category3Vo = new Catelog2Vo.Category3Vo(l2.getCatId().toString(), l3.getCatId().toString(), l3.getName());

                        return category3Vo;
                    }).collect(Collectors.toList());
                    catelog2Vo.setCatalog3List(category3Vos);
                }

                return catelog2Vo;
            }).collect(Collectors.toList());
        }

        return catelog2Vos;
    }));

    //3、将查到的数据放入缓存,将对象转为json
    String valueJson = JSON.toJSONString(parentCid);
    stringRedisTemplate.opsForValue().set("catalogJson", valueJson, 1, TimeUnit.DAYS);

    return parentCid;
}

/**
     * 从数据库查询并封装数据::本地锁
     * @return
     */
public Map<String, List<Catelog2Vo>> getCatalogJsonFromDbWithLocalLock() {
    
    

    // //如果缓存中有就用缓存的
    // Map<String, List<Catelog2Vo>> catalogJson = (Map<String, List<Catelog2Vo>>) cache.get("catalogJson");
    // if (cache.get("catalogJson") == null) {
    
    
    //     //调用业务
    //     //返回数据又放入缓存
    // }

    //只要是同一把锁,就能锁住这个锁的所有线程
    //1、synchronized (this):SpringBoot所有的组件在容器中都是单例的。
    //TODO 本地锁:synchronized,JUC(Lock),在分布式情况下,想要锁住所有,必须使用分布式锁
    synchronized (this) {
    
    

        //得到锁以后,我们应该再去缓存中确定一次,如果没有才需要继续查询
        return getDataFromDb();
    }
}

Seeing the above code, do you think that setting a local lock is no problem? If it is a single application, it is completely ok, but for distributed projects, there are still some problems. Let me show you a picture:

Insert picture description here

Our project is a distributed cluster project. Of course, a service must have many servers. Suppose we call 100,000 requests, and the requests to each server after load balancing are 10,000 requests. At the same time, it is judged that there is no in the cache, then Each server will send a request to the database. If there are few servers, it is still ok, but it is not in line with our original intention. We only want to check the database once, and subsequent requests will be transferred to redis. Then we can use distributed locks to solve!

How is the distributed lock designed? Let's draw a picture to better understand
Insert picture description here

In layman's terms, we can go to the same place to "occupy the hole", and if we do, we will execute the logic. Otherwise, you must wait until the lock is released. Occupy the lock can go to redis, you can go to the database, you can go to any place that everyone can access, waiting can use the spin method.

My plan is to take the lock in redis. It is a product that naturally implements distributed locks. With its instructions, distributed locks can be realized.

set key value ex|px nx|xx;
// 我们可以采用这个指令:
set key value ex nx;
// 也就是当这个键不存在的是设置锁

How to realize this distributed lock?

Option One:

Insert picture description here

With this design scheme, there will be a problem: when the thread acquires the lock, then executes the business logic and prepares to delete the lock, suddenly the server goes down, which will cause the lock to exist all the time, and it will cause death if it cannot be released. The lock situation.

The solution is: set an expiration time, even if the server is down and cannot be released manually, it can be automatically released after the expiration date

Option II:

Insert picture description here

The problem of solution one is solved, but there will be problems. If we are going to set the expiration time after acquiring the lock, the server is down at this time, which will also cause a deadlock.

Solution: Ensure that acquiring locks and setting expiration time are atomic, and the setnx ex command can ensure atomicity

third solution:
Insert picture description here

This solution solves the atomicity of setting locks, but when deleting locks, should they be deleted directly? When our business execution time is very long, it is assumed that the lock has expired and other threads have acquired the lock. After the previous thread has executed the business, to delete the lock, it will delete the lock of others

Solution: Specify your own UUID when setting the lock. After executing the business, obtain the lock and check whether it was set by yourself before. If it is set by yourself, delete it, otherwise skip it. Make sure to delete the lock. Atomicity, why? If we acquired this lock, we did set it before, but there is still a period of time between acquiring the value and deleting the lock. If during this period, the lock fails and someone else acquires the lock, we still think that the lock is ourselves. , Will lead to accidental deletion.

Option Four:

Insert picture description here

Option four is the ultimate solution. In short, it is necessary to ensure atomicity when acquiring and deleting locks!

Next comes a key link: how to solve cache data consistency?

There are two options:

  • Double write mode
  • Failure mode

Let’s draw a picture and analyze the workflow of the dual writing mode:

Insert picture description here

Let's draw a picture to analyze the workflow of the failure mode:

Insert picture description here

In fact, these two schemes will cause data inconsistency. For example, in the double write mode, two write requests come in one after another. After processing, the write cache is due to network delays and other reasons. The first write request is written to the cache, which results in data inconsistency, and the data in the cache is not the latest data; for example, in the failure mode, look at the picture to know that when I have not completed the second write request , I went to read the cache, but didn’t read it, and then checked it in the database. When I read it, assuming that the second request has not been completed, when the second request is completed, delete the cache, and I will update to the cache again. Cause data inconsistency issues.

How can we solve the above problems?

solution:

  • If it is user latitude data (order data, user data), the chance of such concurrency is very small, so there is no need to consider the problem of data inconsistency. The cached data plus the expiration time can be triggered every time to read and actively update
  • If it is basic data such as menus and product introductions, you can also use canal to subscribe to binlog. The information in the database changes, and canal collects the information, does some processing, and then synchronizes to redis.
  • Cached data + expiration time is enough to solve most business requirements for caching
  • If there are a little more write operations, we can guarantee concurrent reads and writes by locking, lining up when writing and writing, ensuring the order, and no locking when reading, so read and write locks are used (business is not related to heart data , Allowing temporary dirty data to be ignored)

Here is a summary

Having said that, let's summarize it!
The data that we can put into the cache should not require high real-time and data consistency. So add the expiration time when caching data to ensure that you get the latest data every day. We should not over-design and increase the complexity of the system. When encountering data with high real-time and consistency requirements, we should query the database, slower than slower.

Guess you like

Origin blog.csdn.net/MarkusZhang/article/details/107851730