How to ensure database and cache consistency

The introduction of caching is to improve query performance, but in the case of high concurrency, there will be inconsistency in updating

Consistency issues caused by concurrency

Suppose we adopt the scheme of "update the database first, then update the cache", and on the premise that both steps can be "successfully executed", what will happen if there is concurrency?

There are two threads, thread A and thread B, who need to update the "same piece" of data, and this scenario will happen:

1. Thread A updates the database (X = 1)
2. Thread B updates the database (X = 2)
3. Thread B updates the cache (X = 2)
4. Thread A updates the cache (X = 1)
The final value of X is in the cache It is 1 in the database and 2 in the database, and an inconsistency occurs.

That is to say, although A happens before B, the time for B to operate the database and cache is shorter than that of A, and the execution timing is "disordered", and finally the result of this data is not in line with expectations.

Similarly, if you adopt the scheme of "update the cache first, then update the database", there will be similar problems.

In addition, it is not recommended to evaluate this solution from the perspective of "cache utilization".

This is because every time the data changes, the cache is "brainlessly" updated, but the data in the cache may not be "read immediately", which may cause a lot of infrequently accessed data to be stored in the cache, wasting the cache resource.

Moreover, in many cases, the value written in the cache does not correspond to the value in the database one by one. It is very likely that the database is queried first, and then a value is obtained through a series of "calculations", and then the value is written. into the cache.

It can be seen that this "update database + update cache" scheme not only has a low cache utilization rate, but also causes a waste of machine performance.

So at this time we need to consider another solution: delete the cache.

Does dropping the cache guarantee consistency?

There are also two options for deleting the cache:

Delete the cache first, then update the database
Update the database first, then delete the cache

Here we focus on the "concurrency" issue.

1) Delete the cache first, then update the database

If two threads want to "read and write" data concurrently, the following scenarios may occur:

Thread A wants to update X = 2 (original value X = 1)
Thread A deletes the cache first
Thread B reads the cache and finds that it does not exist, and reads the old value from the database (X = 1)
Thread A writes the new value to the database ( X = 2)
Thread B writes the old value into the cache (X = 1)
The final value of X is 1 (old value) in the cache and 2 (new value) in the database, and an inconsistency occurs.

It can be seen that the cache is deleted first, and then the database is updated. When "read + write" concurrency occurs, there are still data inconsistencies.

2) Update the database first, then delete the cache

Still 2 threads concurrently "read and write" data:

X does not exist in the cache (database X = 1)
Thread A reads the database and gets the old value (X = 1)
Thread B updates the database (X = 2)
Thread B deletes the cache
Thread A writes the old value to the cache (X = 1 )
The final value of X is 1 (old value) in the cache and 2 (new value) in the database, and inconsistency also occurs.

This is "theoretically" possible, but is it really possible in practice?

In fact, the probability is "very low", because it must meet 3 conditions:

The cache has just expired
. Read requests + write requests concurrently
. The time to update the database + delete the cache (steps 3-4) is shorter than the time to read the database + write the cache (steps 2 and 5). Think carefully, the
probability of condition 3 occurring is actually very low.

Because writing to the database is generally "locked" first, writing to the database usually takes longer than reading the database.

From this point of view, the solution of "updating the database first and then deleting the cache" can guarantee data consistency.

Therefore, we should adopt this scheme to operate the database and cache.

Ok, the concurrency problem is solved, let's continue to look at the problem of data inconsistency caused by "failure" in the second step of execution.

How to ensure that both steps are executed successfully?

We analyzed earlier that whether it is updating the cache or deleting the cache, as long as the second step fails, it will lead to inconsistency between the database and the cache.
The solution is: retry asynchronously

  1. Write MQ after updating the database and put it in a transaction
  2. Subscribe to mysql binlog and write MQ
  3. Consume MQ for cache deletion, retry on failure

So far, in order to ensure the consistency of the database and cache, it is recommended to adopt the "update the database first, then delete the cache" scheme, and cooperate with the "message queue" or "subscribe to the change log" method.

Master-slave library delay and delayed double deletion problem

Regarding the issue of cache and database consistency in the case of "read-write separation + master-slave replication delay".

Under the "update the database first, then delete the cache" scheme, "read-write separation + master-slave library delay" will actually lead to inconsistencies:

Thread A updates the main library X = 2 (the original value X = 1)
Thread A deletes the cache
Thread B queries the cache, fails to hit, and queries the "slave library" to get the old value (slave library X = 1)
The slave library "synchronization" is completed (master Slave library X = 2)
Thread B writes the "old value" into the cache (X = 1)
The final value of X is 1 (old value) in the cache and 2 (new value) in the master-slave library, and inconsistency also occurs .

Did you see it? The core of these two problems is that the caches are replanted with "old values".

So how to solve this kind of problem?

The solution given by the industry: cache delay double deletion strategy .

But here comes the question, how long does it take to set the delay time for this "delayed deletion" cache?

Question 1: The delay time is greater than the delay time of "master-slave replication"
Question 2: The delay time is greater than the time for thread B to read the database + write to the cache.
However, this time is actually very difficult in distributed and high-concurrency scenarios assessed.

In many cases, we roughly estimate the delay time based on experience, such as a delay of 1-5s, which can only reduce the probability of inconsistency as much as possible.

So you see, using this scheme is only to ensure consistency as much as possible. In extreme cases, inconsistencies may still occur.

Therefore, in actual use, I still recommend that you adopt the solution of "update the database first, then delete the cache". At the same time, try to ensure that the "master-slave replication" does not have too much delay to reduce the probability of problems.

Can it be strongly consistent?

Seeing this, you may think that these solutions are still not perfect. I want to make the cache and database "strongly consistent". Can it be done?

It's actually hard.

To achieve strong consistency, the most common schemes are consensus protocols such as 2PC, 3PC, Paxos, and Raft, but their performance is often relatively poor, and these schemes are also relatively complex, and various fault tolerance issues must be considered.

On the contrary, let's think about it from another angle at this time. What is the purpose of introducing caching?

That's right, performance.

Once we decide to use caching, we will inevitably face consistency issues. Performance and consistency are like two ends of a scale, and you can't have both.

Moreover, taking the solution we mentioned above as an example, before the operation of the database and the cache are completed, as long as other requests can come in, it is possible to find the "intermediate state" data.

Therefore, if you must pursue strong consistency, you must require that no "any requests" come in before all update operations are completed.

Although we can achieve it by adding "distributed locks", the price we have to pay is likely to exceed the performance improvement brought by the introduction of caching.

Therefore, since we decide to use the cache, we must tolerate the "consistency" problem, and we can only reduce the probability of the problem as much as possible.

At the same time, we also need to know that all caches have an "expiration time". Even if there is a short-term inconsistency during this period, we still have the expiration time to cover the bottom line, so that the final consistency can be achieved.

Summarize

1. If you want to improve the performance of the application, you can introduce "cache" to solve the problem

2. After the cache is introduced, the consistency between the cache and the database needs to be considered. It is recommended to use the solution of "update the database first, then delete the cache"

3. Under the "update the database first, then delete the cache" scheme, in order to ensure the successful execution of both steps, it is necessary to cooperate with the "message queue" or "subscribe to the change log" scheme, which is essentially guaranteed by "retry" data consistency

4. Under the scheme of "update the database first, then delete the cache", "separation of read and write + master-slave database delay" will also lead to inconsistency between the cache and the database. The solution to alleviate this problem is "delayed double deletion". "message" to the queue, delay deleting the cache, and at the same time control the delay of the master-slave library to reduce the probability of inconsistency as much as possible

Guess you like

Origin blog.csdn.net/qq798280904/article/details/130747690