8 pictures take you to analyze the data consistency between Redis and MySQL

Preface

Original public account: bigsai

For the Web, the increase in the number of users and visits to a certain extent promotes the change and progress of project technology and architecture. There may be some of the following conditions:

  1. Page concurrency and visits are not many, MySQL's 足以支撑own logical business development. In fact, no caching is required. At most, static pages can be cached.
  2. The concurrency of the page has increased significantly, the database is somewhat under pressure, and some data is updated less frequently 反复被查询or query speed 较慢. Then you can consider using caching technology to optimize. The high-hit objects are stored in the key-value form of Redis, so if the data is hit, it does not need to go through the inefficient db. Find data from efficient redis.
  3. Of course, you may also encounter other problems. You can also increase the amount of system concurrency through static page caching, CDN acceleration, and even load balancing. No introduction here.

image-20201106173835116

Caching ideas are everywhere

We start to understand the meaning of caching from an algorithm problem.

Question 1:

  • Enter a number n(n<20), ask n!;

Analysis 1 :

  • Only consider the algorithm, and do not consider the value out of bounds problem.
    Of course we know n!=n * (n-1) * (n-2) * ... * 1= n * (n-1)!;
    then we can solve the problem with a recursive function.
static long jiecheng(int n)
{
    if(n==1||n==0)return 1;
    else {
      return n*jiecheng(n-1);
    }
}

In this way, each input request needs to be executed ntimes.
Question 2:

  • Enter t groups of data (maybe hundreds or thousands), one xi for each group (xi<20), find xi!;

Analysis 2 :

  • If you use 递归, input t groups of data, each time the input is xi, then the number of times to be executed each time is:
    image-20201106175003051
    each time when the input Xi is too large or t is too large , it will cause a lot of burden! Time complexity is O(n2)
  • So can you change your mind. Yes, yes 打表. Tabulation is often used in the ACM algorithm, often used to solve the problems of multiple sets of input and output, graph theory search results, and path storage. So, find the factorial for this. We only need to apply for an array, store the required number in the array according to the number from front to back, and output the array value directly when obtaining it later. The idea is clear:
import java.util.Scanner;
public class test {
public static void main(String[] args) {
    // TODO Auto-generated method stub
    Scanner sc=new Scanner(System.in);
    int t=sc.nextInt();
    long jiecheng[]=new long[21];
    jiecheng[0]=1;
    for(int i=1;i<21;i++)
    {
        jiecheng[i]=jiecheng[i-1]*i;
    }
   for(int i=0;i<t;i++) {
        int x=sc.nextInt();
        System.out.println(jiecheng[x]);
    }
}  
}
  • The time complexity is only O(n) . The thought here is 缓存similar to the thought. First store the data in the jiecheng[21] array. Perform a calculation. When continuing to access later, it is equivalent to asking for static array values. Each time is O (1 operation).

Application scenarios of caching

Caching is suitable for high-concurrency scenarios to increase service capacity. Mainly save from 经常被访问的数据or query 成本较高from slow media to faster media, such as from 硬盘—> 内存. We know that most relational databases are 基于硬盘读写, and their efficiency and resources are limited, while redis is memory-based, and its read and write speeds vary greatly. When the performance of a relational database with high concurrency reaches a bottleneck, you can strategically place frequently accessed data in Redis to increase system throughput and concurrency.

For common websites and scenarios, relational databases may be slow in two main places:

  • Poor read and write IO performance
  • A piece of data may be calculated by a larger amount

Therefore, the use of caching can reduce the number of disk IO times and the number of calculations in relational databases. The fast reading speed is also reflected in two aspects:

  • Based on memory, faster reading and writing
  • Use the hash algorithm to directly locate the result without calculation

So for a decent, somewhat the size of the site, it is cached necessary, and Redis is undoubtedly one of the best choices.

image-20201106180929673

Need to pay attention to

Improper use of the cache can cause many problems. Therefore, some details need to be carefully considered and designed. Of course, the most rare data consistency is analyzed separately below.

Whether to use cache

Projects cannot use cache for the sake of caching. Cache is not necessarily suitable for all scenarios. If the data consistency is extremely high , or the data is frequently changed and there are not many queries , or there is no concurrency at all, the query is not necessarily required. Caching may also waste resources, making the project bloated and difficult to maintain, and using redis caching may encounter data consistency issues that need to be considered.

Reasonable cache design

When designing the cache, it is very likely that you will encounter multi-table queries. If you encounter the key-value pairs of the multi-table query cache, you need to consider rationally. Should they be split or combined? Of course, if there are many types of combinations but not many frequently appearing, they can be cached directly. The specific design should be based on the business needs of the project, and there is no very absolute standard.

Expiration strategy selection

  • The cache contains relatively hot and commonly used data, and Redis resources are also limited. You need to choose a reasonable strategy to delete the cache after expiration. We have learned 操作系统and known that there are first-in-first-out algorithms ( FIFO ) in computer cache implementation ; least recently used algorithm ( LRU ); best elimination algorithm ( OPT ); least page access algorithm ( LFR ) and other disk scheduling algorithms. You can also learn from when designing Redis cache. FIFO based on time is the best implementation. And Redis is 全局keysupporting the expiration strategy.
  • And the expiration time should be set reasonably according to the system situation. If the hardware is better, it can be a little longer, but the expiration time may not be good if it is too long or too short. If it is too short, the cache hit rate may not be high, and too long may cause A lot of unpopular data is stored in Redis and not released.

Data consistency issues★

In fact, the above mentioned data consistency issues. If the consistency requirements are extremely high, then caching is not recommended. Let's sort out the cached data a bit.
Data consistency problems are often encountered in Redis cache. For a cache, several situations are listed below:

read

read: Read from Redis. If it is not in Redis, then update the Redis cache from MySQL.
The following flowchart describes the general scene, there is no dispute:

image-20201106184713215

Write 1: Update the database first, then update the cache (normal low concurrency)

image-20201106184914749

Update the database information, and then update the Redis cache. This is a common practice. The cache is based on the database and is taken from the database.

However, some problems may be encountered. For example, if the update cache fails (downtime and other conditions), the database and Redis data will be inconsistent. Create new data in DB and cache old data .

Write 2: Delete the cache first, and then write to the database (low concurrency optimization)

image-20201106184958339

solved problem

This situation can effectively avoid the problem of preventing writing to Redis in writing 1 . Delete the cache to update. The ideal is to make the next visit to Redis empty to MySQL to get the latest value to the cache. However, this situation is limited to low-concurrency scenarios and does not apply to high-concurrency scenarios.

The problem

Write 2 though 看似写入Redis异常的问题. It seems to be a better solution, but there are still problems in high-concurrency solutions. We discussed in writing 1 that if the update library is successful, the cache update failure will cause dirty data. Our ideal is to delete the cache so that 下一个线程access is suitable for updating the cache. The question is: what if this next thread comes too early, too coincidentally ?
image-20201106191042265

Because of multi-threading, you don't know who goes first, who is fast and who is slow. As shown in the figure above, there will be inconsistencies between Redis cache data and MySQL. Of course you can do the key 上锁. But locks, such a heavyweight thing, have too much influence on concurrency, so don't use it without locking! The above situation will still cause the cache to be old data and DB to new data under high concurrency . And if the cache does not expire, this problem will always exist.

Write 3: Delay double delete strategy

image-20201106191310072

This is the delayed double delete strategy, which can alleviate the inconsistency between the Redis cache and the MySQL data caused by the entry of read threads in the process of updating MySQL in Write 2 . The method is to delete the cache -> update the cache -> delay (a few hundred ms) (asynchronously) to delete the cache again . Even if the write 2 problem occurs in the middle of updating the cache . Cause data inconsistency, but the delay (specifically, depending on the business, usually several hundred ms) delete again can quickly resolve the inconsistency.

But the write-only solution actually has loopholes, such as the second deletion error, the pressure on MySQL access under high concurrency, etc. Of course, you can choose to use message queues such as MQ to solve asynchronously. In fact, the actual solution is difficult to take into account foolproof, so many bigwigs may be sprayed because of some flaws in the design. As a Caicai author, I don’t want to show your ugliness here. Everyone welcomes your contributions.

Write 4: Directly manipulate the cache and write to SQL regularly (suitable for high concurrency)

When 一堆并发(写)something is thrown, it is difficult to give users a comfortable experience even if the previous schemes use message queue asynchronous communication. And the large-scale operation of SQL will also cause a lot of pressure on the system. So another solution is to directly manipulate the cache and write the cache to SQL periodically. Because Redis, a non-relational database, is based on memory operations, KV is much faster than traditional relational databases.

image-20201106192531468

The above is suitable for business design in high concurrency situations. At this time, Redis data is the mainstay and MySQL data is auxiliary. Insert it regularly (like a data backup library). Of course, this kind of high concurrency often has different requirements due to business pairs , order, etc., and may have to use 消息队列and complete the uncertainty and inconsistency caused by high concurrency and multithreading. Stability, improve business reliability.

In short, the more 高并发and more the right 数据一致性要求高solution is needed 考虑和顾及in the design of data consistency 越复杂、越多. The above is also the author's learning and self-divergence (nonsense) learning for Redis data consistency problems. If there is an unreasonable explanation, or please correct me!

Finally, if you feel good, please click three links. Welcome to pay attention to the original public account : " bigsai ". Here, you can not only learn knowledge and dry goods, but also prepare a lot of advanced information for you. Just reply to the "bigsai" password. receive!

8 pictures take you to analyze the data consistency between Redis and MySQL

Guess you like

Origin blog.51cto.com/14983647/2548012