Principle analysis and solutions to the double-write consistency problem between Redis cache and database under high concurrency

1. What is double-write inconsistency between cache and database?

      When the data is updated, we not only need to update the database data, but also update the cache. If both the cache and the database are updated, there will be two problems: whether to update the database first or the cache first. When there are no exceptions in single-thread execution, the cache and the database can maintain consistency no matter who comes first. However, double-write inconsistency may occur under abnormal circumstances or concurrent operations. Here we will focus on how to ensure double-write consistency between the Redis cache and the database under high concurrency.

  • Single-threaded double-write inconsistency example
    This situation is actually easy to solve, because do not modify the cache in a database transaction, just delete the cache directly, and update the database Then just delete the corresponding cache. The cache is being cached during the query operation. Even if the cache deletion fails or other business exceptions occur after the deletion is successful, it will not be affected.
    Insert image description here

  • High concurrency double write inconsistency example
    Generally, we update the database first and then delete the cache. If the data is queried during the query, the data will be written to the cache. As you can see here during concurrent operations, when the T2 thread queries the database and writes to the cache, there may be a time when the data is queried and ready to be written to the cache. During this period, the T3 thread writes new data to the database and deletes the cache. Then the T2 thread here Data written to the cache is dirty data.
    Insert image description here

2. Common solutions to ensure double-write consistency under high concurrency

      For data with a low probability of concurrency (such as order data, user data, etc. in a single user dimension), there is almost no need to consider this issue. Cache inconsistencies rarely occur. You can add an expiration time to the cached data. Just trigger active updates of reads every once in a while. Even if the concurrency is very high, if the business can tolerate short-term cached data inconsistencies (such as product names, product pictures, etc.), caching plus expiration time can still solve most business problems. Caching requirements.
      In code implementation, generally we update the database first and then delete the cache. Only by ensuring that the database data is successfully updated can the cache be moved. This should be the method used by most people. Here it is also I will explain this process. There are several common and feasible methods such as delayed double deletion, distributed locks, MQ asynchronous consumption, subscribing to database change logs... Is it meaningless to delete the cache first and then update the database? If there is a problem, here are some instructions for these methods.

2.1. Delayed double deletion (unreliable)

      Delayed double deletion is theoretically unreliable, but can only reduce the occurrence of double write inconsistency in some specific situations. For example, when three threads enter the T2 thread at the same time, first query the database to get the results and prepare to write the cache, the T1 thread gets the CPU The execution right starts to execute the modified database logic. After the modification, the cache is deleted after a delay of 0.5S. During this period, the T2 thread performs the write cache operation. This written data is dirty data. After 0.5S, the T1 thread deletes the cache and ignores the T3 thread first. In this way, when the T4 thread performs a query, if there is no value in the cache, it will re-query the database to obtain the latest value. However, this logic is very fragile. The T3 thread may not be executed due to some circumstances, and suddenly starts execution after the T4 thread queries the database. At this time, the T3 thread executes the modification logic and the cache deletion logic before the T4 thread writes the previously queried data into the cache. Then this data is dirty data, and cache inconsistency problems occur again.

Insert image description here

2.2. Distributed read-write lock (reliable)

      If you cannot tolerate cache data inconsistency, you can add read-write locks to ensure concurrent reading and writing, read-read sharing, read-write mutual exclusion, and write-write mutual exclusion. This can ensure double-write consistency of the business, but Redis performance overhead will be relatively large. Each query request must call Redis for lock judgment, and the lock must be controlled to avoid long waiting times and deadlock problems. Here are two business implementation processes, each with its own advantages and disadvantages, which can be selected according to the specific business.

  • Process 1: Failed to acquire the lock, polling to wait for successful lock acquisition.

Insert image description here

  • Process 2: Failed to acquire the lock, execute the query process but do not write the data to the cache
    Insert image description here

2.3. MQ asynchronous consumption (unreliable)

      Asynchronous consumption through MQ and delayed double deletion are actually similar, but they can only solve part of the problem. For example, in the example I gave here, the three threads are all parallel. After the T1 thread modifies the database successfully, it sends an MQ message, and the T2 thread queries the product. The information written to the cache writes dirty data. Ignore the T3 thread first. After the T2 thread writes the cache, MQ consumes the message and deletes the cache. Then, if you query later, you can get the latest data. Now look Looking at the T3 thread, when querying product information and preparing to write it into the cache, MQ consumes the message and first deletes the cache. Then the T3 thread is writing to the cache. At this time, there is dirty data in the cache again, which is similar to delayed double deletion.

Insert image description here

2.4. Subscription to database change log (unreliable)

      When a piece of data is modified in MySQL, MySQL will generate a change log (Bin Log). We can subscribe to this log to obtain specific operation data, and then delete the corresponding cache based on this log data. , subscribe to the more mature open source middleware of the change log, such as Alibaba's canal, through canal Subscribe to Bin Log and then send the data to MQ for processing by the corresponding consumer. canal only does data collection but not business processing. This method is similar to MQ asynchronous consumption. It is nothing more than collecting data changes through canal and then sending them to MQ. There will still be the same MQ asynchronous consumption. question.

Insert image description here

3. Summary

      Some solutions are introduced here. Except for locking, other solutions cannot completely guarantee the double-write consistency problem. In fact, there are other solutions that can also ensure the consistency problem. They must be combined with business design. We will not go into details here. The above we are targeting read-multiple When there are few writes, adding cache can improve performance. If there are many writes and reads and the inconsistency of the cached data cannot be tolerated, then there is no need to add cache and the database can be directly operated. The data put into the cache should be data that does not have high requirements for real-time performance and consistency. Remember not to do a lot of over-design and control to increase system complexity in order to use caching while ensuring absolute consistency.

3.1. How to choose a suitable solution

      Different caching strategies may have different impacts on different application scenarios and requirements, so it is necessary to choose the appropriate caching strategy and combination according to the specific situation.
Insert image description here

3.1.1. Update the database first and then delete the cache (normal consistency)

      Generally speaking, when we write code, we must first update the database before deleting the cache. The latter two solutions are also based on updating the database first before deleting the cache. Of course, different businesses may be different. For product information here, we must update the database first. Only then will the cache be deleted. This can ensure the minimum consistency problem, is suitable for businesses with low concurrency, reads more and writes less, and is simple to implement without relying on other middleware.

3.1.2. Delayed double deletion, MQ asynchronous consumption, subscription to database change log (advanced consistency)

      The three solutions of delayed double deletion, MQ asynchronous consumption, and database change log subscription are actually measures to improve consistency. They can be used under certain conditions. However, if extreme situations are considered, there are still problems. And it will be a little more troublesome to implement. MQ asynchronous consumption and database change log subscription also need to rely on external middleware to increase system complexity, leading to some unexpected problems.

3.1.3. Distributed read-write lock (ultimate consistency)

      Using distributed read-write locks can fully guarantee consistency. If the business data consistency requirements are very high, you can consider using distributed read-write locks. If the consistency requirements are not very high, it is not very appropriate to use locks to solve the problem. For example, product information is only used for display and will only be changed during background editing. Then lock judgment is also performed during query, which actually adds a lot of unnecessary overhead. Every query request must call Redis for locking. When using When solving consistency problems with distributed read-write locks, stress testing must be done to avoid insufficient online resources.

3.2. Other solutions

      All solutions need to be combined with business, and the optimal solutions for different businesses are different. Here are some solutions to actual cases.

3.2.1. Manually refresh the cache after product information is updated.

      No matter what kind of solution, manual refresh is actually quite reliable. The background management has designed a batch refresh product information cache function. When the product information changes, if you are worried that the cache has not been updated to the latest product information or the actual data you see is old data, you can do it. Use manual refresh to solve this problem. Just select the products that need to be refreshed and refresh them in batches.

3.2.2. Asynchronous cache deletion through delayed messages

      In fact, this design principle is to combine delayed double deletion, MQ asynchronous consumption, and subscription to the database change log. After our product information is updated, the cache will be deleted, and a delayed message will be delivered or a delayed message will be delivered when subscribing to the database change log. Assume The execution is delayed for 3 seconds. After the consumer receives this message 3 seconds later, the cache will be deleted again. During this period, even if the old data is inserted into the cache, the cache can be deleted during this 3s delayed consumption. Why should we consider the limit? In this case, the delay time should be longer, 5s 10s 15s..., because MQ asynchronous consumption will not affect the main business and the frequency of modification of product information will not be very high, and just refreshing the cache operation will be very fast. This solution is completely feasible. It’s just that the delay time must be controlled according to the business and actual conditions.

Insert image description here

Guess you like

Origin blog.csdn.net/weixin_44606481/article/details/134261751