Common solutions to the consistency problem of Redis and database

Reprinted at: https://blog.csdn.net/diweikang/article/details/94406186

Cache has been widely used in projects. When it comes to reading cache, no one has any doubts. Business operations are performed in accordance with the process shown in the figure below.

But in terms of updating the cache, whether to update the cache or delete the cache after updating the database. Or maybe delete the cache first, and then update the database. In fact, everyone has a lot of controversy. So refer to some information on the Internet to explain this.

Let me explain first. In theory, setting an expiration time for the cache is a solution to ensure eventual consistency. Under this scheme, we can set an expiration time for the data stored in the cache, all write operations are based on the database, and the cache operation is only to do our best. In other words, if the database is successfully written and the cache update fails, as long as the expiration time is reached, subsequent read requests will naturally read the new value from the database and then backfill the cache. Therefore, the ideas discussed next do not rely on the scheme of setting an expiration time for the cache.

Here, we discuss three update strategies:

Update the database first, then update the cache
Delete the cache first, then update the database
Update the database first, then delete the cache

1. Update the database first, then update the cache

This plan is generally opposed by everyone. why? There are two reasons.

Reason one (thread safety perspective)

There are both request A and request B for update operation, then there will be

(1) Thread A updated the database
(2) Thread B updated the database
(3) Thread B updated the cache
(4) Thread A updated the cache

It appears that requesting A to update the cache should be earlier than requesting B to update the cache, but due to network and other reasons, B updated the cache earlier than A. This leads to dirty data, so it is not considered.

Reason two (business scenario perspective)

There are two points:

(1) If you have a business requirement that has more scenarios for writing databases and fewer scenarios for reading data, adopting this solution will result in frequent updates of the cache before the data is read at all, which wastes performance.

(2) If you write the value to the database, it is not directly written to the cache, but to be written to the cache after a series of complex calculations. Then, after each write to the database, the value written to the cache is calculated again, which is undoubtedly a waste of performance. Obviously, deleting the cache is more suitable.

The next discussion is the most controversial, delete the cache first, and then update the database. Or update the database first, and then delete the cache.

2. Delete the cache first, then update the database

The reason for the inconsistency of the program is. At the same time, there is a request for A to perform an update operation and another request for B to perform a query operation. Then the following situation will occur:

(1) Request A to perform a write operation and delete the cache
(2) Request B to query and find that the cache does not exist
(3) Request B to query the database to get the old value
(4) Request B to write the old value into the cache
(5) Request A to write the new Value is written to the database

The above situation will lead to inconsistencies. Moreover, if the expiration time strategy for the cache is not adopted, the data will always be dirty data.

So, how to solve it? Use delayed double deletion strategy

The pseudo code is as follows

public void write(String key,Object data){
redis.delKey(key);
db.updateData(data);
Thread.sleep(1000);
redis.delKey(key);
}

Translated into Chinese description is

(1) Eliminate the cache first
(2) Write the database again (the two steps are the same as before)
(3) Sleep for 1 second, eliminate the cache again

By doing so, the dirty data in the cache caused within 1 second can be deleted again.

So, how is this one second determined, and how long should it sleep?

In view of the above situation, readers should evaluate the time-consuming business logic of reading data for their own projects. Then, the sleep time of writing data can be added several hundred ms based on the time-consuming business logic of reading data. The purpose of this is to ensure that the read request ends and the write request can delete the dirty data in the cache caused by the read request.

What if you use MySQL's separate read-write architecture?

ok, in this case, the reasons for the data inconsistency are as follows. There are still two requests, one request A for update operation, and the other request B for query operation.

(1) Request A to write and delete the cache
(2) Request A to write data to the database,
(3) Request B to query the cache and find that the cache has no value
(4) Request B to query from the database, at this time, there is no Complete the master-slave synchronization, so the old value is queried
(5) Request B to write the old value into the cache
(6) The database completes the master-slave synchronization, and the slave database becomes the new value

The above situation is the reason for the inconsistent data. The double delete delay strategy is still used, but the sleep time is modified to add a few hundred ms to the delay time of the master-slave synchronization.

What should I do if the throughput decreases with this synchronous elimination strategy?

Ok, then the second delete will be treated as asynchronous. Start a thread by yourself and delete it asynchronously. In this way, the written request does not have to fall asleep for a while and then return. In doing so, increase throughput.

What if the deletion fails for the second time?

This is a very good question, because the second deletion fails, the following situation will occur. There are still two requests, one request A for update operation, and the other request B for query operation. For convenience, suppose it is a single database:

ok, that means. If the second deletion of the cache fails, the problem of inconsistency between the cache and the database will again occur.

How to solve it?

For specific solutions, look at the analysis of the (3) update strategy.

3. Update the database first, then delete the cache

First of all, let me talk about it first. Foreigners proposed a cache update routine called "Cache-Aside pattern". Which points out

Follow-up: The application first fetches data from the cache, if not, fetches the data from the database, and after success, puts it in the cache.
Hit: The application fetches data from the cache and returns after fetching it.
Invalidation: First save the data in the database, and then invalidate the cache after success.

In addition, the well-known social networking site Facebook also proposed in the paper "Scaling Memcache at Facebook" that they also use the strategy of updating the database first and then deleting the cache.

Is there no concurrency problem in this situation?

no. Assuming that there will be two requests, one request A for query operation and one request B for update operation, then the following situation will occur

(1) The cache just failed
(2) Request A to query the database and get an old value
(3) Request B to write the new value into the database
(4) Request B to delete the cache
(5) Request A to write the old value found to the cache

Ok, if the above situation occurs, dirty data will indeed occur.

However, what is the probability of this happening?

There is a congenital condition for the above situation, that is, the database write operation in step (3) takes less time than the database read operation in step (2), and it is possible to make step (4) precede step (5). But, think about it, the speed of database read operations is much faster than that of write operations (otherwise, why do you do read and write separation, the meaning of doing read and write separation is because read operations are faster and consume less resources), so step (3) It takes less time than step (2), and this situation is difficult to occur.

Suppose someone has to lift the bar and has obsessive-compulsive disorder, what should be done?

How to solve the above concurrency problem?

First of all, setting the effective time for the cache is a solution. Secondly, adopt the asynchronous delayed delete strategy given in strategy (2) to ensure that the delete operation is performed after the read request is completed.

Are there other reasons for the inconsistency?

Yes, this is also a problem with both the cache update strategy (2) and the cache update strategy (3). What if the cache deletion fails? Isn't there any inconsistency? For example, if a data write request is written to the database, and the cache deletion fails, there will be inconsistencies. This is also the last question left in the cache update strategy (2).

How to solve?

Just provide a guaranteed retry mechanism, here are two sets of solutions.

Option One:

The process is as follows

(1) Update the database data;
(2) The cache fails to delete due to various problems
(3) Send the key that needs to be deleted to the message queue
(4) Consume the message by yourself, get the key that needs to be deleted
(5) Continue to retry the delete operation until success

However, this scheme has a shortcoming, causing a lot of intrusion to the line of business code. So there is the second plan. In the second plan, a subscription program is started to subscribe to the binlog of the database to obtain the data that needs to be operated. In the application, start another program to obtain the information from the subscription program and delete the cache operation.

Option II:

The process is shown in the figure below:

(1) Update the database data
(2) The database will write the operation information into the binlog log
(3) The subscription program extracts the required data and key
(4) Start a new piece of non-business code to obtain the information
(5) Try to delete the cache Operation, it is found that the deletion fails
(6) The information is sent to the message queue
(7) The data is obtained from the message queue again, and the operation is retryed.

Remarks: The above binlog subscription program has a ready-made middleware called canal in mysql, which can complete the function of subscribing to binlog logs. As for oracle, the blogger currently does not know whether there are ready-made middleware available. In addition, for the retry mechanism, the blogger uses a message queue method. If the requirements for consistency are not very high, just start a new thread in the program and try again every once in a while. You can use these flexibly and freely, just to provide an idea.

reference:

https://blog.csdn.net/hukaijun/article/details/81010475

https://my.oschina.net/jiagouzhan/blog/2990423