Redis and database synchronization problem

The cache acts as a database

For example, Session, which accesses very frequently data, is suitable for this solution; of course, since there is no database involved, there will be no consistency problems;

The cache acts as a database hotspot cache

Read operation

The current read operation has a fixed routine, as follows:

  1. When the client requests the server, if it is found in the server's cache, it will directly fetch the server;

  2. If it does not exist in the cache, request the database and backfill the data calculated by the database to the cache;

  3. Return data to the client;

Write operation

Various situations will cause inconsistencies between the database and the cache, which is the double-write consistency problem of the cache and the database;

There are currently three strategies for caching, namely

  • Cache Aside update strategy: update cache and database at the same time;

  • Read/Write Through update strategy: update the cache first, and the cache is responsible for synchronously updating the database;

  • Write Behind Caching update strategy: update the cache first, and the cache will update the database regularly and asynchronously;

Each of the three strategies has advantages and disadvantages, and can be used according to business scenarios;

Cache Aside update strategy

The general process of this strategy is to fetch it from the cache when the request comes in, and if it hits the cache, it will directly return the read data; on the contrary, if there is no hit, then the data will be successfully obtained from the database and then clear the cache. The data; the specific flow chart is as follows:

 

But the above is a problem in some special cases:

Question 1: Update the database first, then update the cache

In the case of high concurrency, two threads may read dirty data:

  1. Thread A performs the write operation and successfully updates the database;

  2. Thread B also performs the same operation as thread A, but in the process of updating the cache by thread A, thread B updates the new database data to the cache;

  3. Thread A updates the relatively old data to the cache after all operations of thread B are completed;

Question 2: Delete the cache first, then update the database

Similarly, dirty reads will also occur in high concurrency scenarios:

  1. Thread A successfully deleted the cache, waiting to update the database;

  2. Thread B performs a read operation. Because the cache has been deleted at this time, thread B retrieves the old data from the database and updates it to the cache;

  3. Thread A updates the database only after thread B has completed the entire read operation. At this time, the data in the cache is still the old data;

 

Question 3: Update the database first, then delete the cache

At present, this is a relatively common operation, even though it may still be dirty read:

  1. Thread A performs a read operation, and it just misses the cache at this time, and then requests the database;

  2. Thread B performs a write operation. Before thread A gets the data from the database, it writes the data to the database, and also successfully deletes the cache;

  3. Thread A updates relatively old data to the cache after thread B has completed the entire write operation;

However, the above situation will not occur. This is because the above situation needs to satisfy that the read operation of thread A is slower than the write operation of thread B, but in reality, the read operation is usually much faster than the write operation. But in order to avoid the above situation, it is usually necessary to add an expiration time to the cache ;

But imagine what to do if the above deletion of the cache fails. This will obviously lead to dirty data reading. I think the plan is as follows:

  1. Set the expiration time of the cache (must be done);

  2. Provide a guarantee retry mechanism, which keys that failed to be deleted are provided to the message queue for consumption;

  1. Remove these keys from the message queue and delete them again. If they fail, they will be added to the message queue again. If more than a certain number of times are exceeded, manual intervention will occur;

 

 

However, the above situation needs to be operated in the business code, which obviously needs to be decoupled;

At present, our company uses this solution. The specific process is that when the database data is updated, the database will be saved in the form of a binlog log. The binlog is parsed into a point where the programming language can be parsed through the canal open source software, and then the subscription program obtains the data In the future, try to delete the cache operation. If the operation fails, add it to the message queue for repeated consumption. When the number of failures of the delete operation reaches a certain number of times, manual intervention is still necessary.

 

Read/Write Through update strategy

In this mode, the program only needs to maintain the cache, and the synchronization of the database is transferred to the cache to synchronize updates;

The strategy is divided into two specific types:

  1. Read Through: update the cache during the query;

  2. Write Through: If the cache is hit during the write operation, the cache is directly updated, and the database is updated by the cache itself

 

Write Behind Caching update strategy

This strategy only updates the cache and does not update the database immediately. It will only operate the database in batches asynchronously at a certain time; the advantage of this is that the cache is directly operated, which is extremely efficient, and the operation data is asynchronous, and it can also be used multiple Operation database statements are merged into one transaction and submitted together, so the efficiency is very objective;

However, this strategy cannot achieve strong data consistency, and the implementation logic is relatively complicated, because it needs to confirm which ones need to be updated to the database and which ones just want to be stored in the cache;

Compare

Currently, it is usually used in the first strategy to update the database first, and then update the cache; the other ones are more complicated to implement;

The last thing I want to say is that caching is meant to improve performance at the expense of strong consistency, so there must be a certain delay time, we only need to ensure the final data consistency;

Guess you like

Origin blog.csdn.net/qq_36802726/article/details/105687698