Interesting | How do databases and caches ensure consistency?

Author: Xiaolin Coding
Graphical Computer Basics Website: https://xiaolincoding.com/

One day, the boss said, "Recently, the company has more and more users, but the access speed of the server is getting worse and worse. Awang helped me to optimize it, and I have done a good job of drawing a cake for you! ".

picture

Programmer Awang was very much looking forward to hearing the "painting cake" from the boss's mouth, and accepted the task given by the boss without any hesitation.

Awang logged in to the server, and after some investigation, it was confirmed that the performance bottleneck of the server was in the database .

This is easy to do, add Redis to the server and use it as a cache for the database.

In this way, when the client requests data, if the data can be hit in the cache, the cache can be queried instead of querying the database, thereby reducing the pressure on the database and improving the performance of the server.

Update the database first, or the cache first?

After Awang had this idea, he was ready to start optimizing the server, but there was such a problem in front of him.

picture

Due to the introduction of the cache, when the data is updated, not only the database but also the cache must be updated. There are problems before and after these two update operations :

  • Update the database first, then update the cache;
  • Update the cache first, then update the database;

Awang didn't expect too much. He felt that the latest data must first update the database, so as to ensure that the data in the database is up-to-date, so he adopted the plan of " update the database first, and then update the cache".

After several nights of tossing, Awang finally "optimized the server", and then went online directly, and ran to report to the boss full of confidence.

The boss doesn't understand technology, so naturally he didn't worry too much, so he asked Ngawang to observe the server's situation.

Awang observed for several days and found that the pressure on the database was greatly reduced, and the access speed was also improved a lot.

The good times didn't last long. Suddenly, the boss received a complaint from a customer. The customer said that he had just initiated two operations to update the age , but the displayed age was indeed the age of the first update, and the second update did not take effect.

The boss immediately found Ngawang and reprimanded Ngawang: " There are bugs in such a simple update operation? Where do I put my face? Do you still want your cake? "

Awang panicked when he heard that the cake he was about to get was about to be gone, and immediately logged in to the server to troubleshoot the problem. Awang found the problem after querying the cache and database data.

The data of the database is the data of the second update operation of the client, and the cache is indeed the data of the first update operation, that is, there is a problem of inconsistency between the database and the cached data .

This problem is a big one. After a round of analysis, Awang caused the inconsistency between the data in the cache and the database because of concurrency problems !

Update the database first, then update the cache

For example, if two requests, "request A" and "request B", update the "same" data at the same time, this sequence may appear:

picture

A requests to update the database data to 1 first, and then requests B to update the database data to 2 before updating the cache, then also updates the cache to 2, and then A requests to update the cache to 1.

At this time, the data in the database is 2, but the data in the cache is 1, and the data in the cache and the database are inconsistent .

Update the cache first, then update the database

Would there still be a problem with the solution of " update the cache first, then update the database "?

There are still concurrency problems, and the same is true for analysis.

Assuming two requests, "request A" and "request B", update the "same" data at the same time, the following sequence may occur:

picture

A requests to update the cached data to 1 first, then before updating the database, B requests to update the cached data to 2, then updates the database to 2, and then A requests to update the database data to 1.

At this time, the data in the database is 1, but the data in the cache is 2, and there is an inconsistency between the data in the cache and the database .

Therefore, whether it is "update the database first, then update the cache" or "update the cache first, then update the database", both schemes have concurrency problems. When two requests update the same data concurrently, there may be a cache. Inconsistency with the data in the database .

Update the database first, or delete the cache first?

After locating the problem, Awang decided to delete the data in the cache instead of updating the cache when updating the data. Then, when the data is read, it is found that there is no data in the cache, and then the data is read from the database and updated to the cache.

The strategy that Awang thinks has a name, it is called the Cache Aside strategy , and the Chinese is called the bypass cache strategy.

This strategy can be further subdivided into "read strategy" and "write strategy".

picture

Steps to write a strategy:

  • update data in the database;
  • Delete the data in the cache.

Steps to read strategy:

  • If the read data hits the cache, the data is returned directly;
  • If the read data does not hit the cache, read the data from the database, write the data to the cache, and return it to the user.

When Nga Wang thought of "writing strategy", he fell into deeper thinking, which order should he choose?

  • Delete the cache first, then update the database;
  • Update the database first, then delete the cache.

After the last lesson, Ngawang no longer "takes it for granted" with the random election plan, because the boss gave a big cake this time, so he must grasp it.

So Awang analyzes it from the perspective of concurrency to see which of the two schemes can ensure the data consistency between the database and the cache.

Delete the cache first, then update the database

Awang still analyzes the scene of the user table.

Suppose a user's age is 20, request A wants to update the user's age to 21, so it will delete the content in the cache. At this time, another request B wants to read the age of the user. After it queries the cache and finds a miss, it will read the age of 20 from the database, and write it into the cache, and then request A to continue to change the database and put the user The age was updated to 21.

picture

Ultimately, the user's age is 20 (old value) in the cache and 21 (new value) in the database, and the cache and database data are inconsistent.

It can be seen that if the cache is deleted first, and then the database is updated, the problem of data inconsistency between the cache and the database will still occur when "read + write" is concurrent .

Update the database first, then delete the cache

Continue to analyze the concurrent scenario of "read + write" requests.

If a user data does not exist in the cache, request A to query the database to find the age of 20 when reading the data, and another request B to update the data when it is not written to the cache. It updates the age in the database to 21 and clears the cache. At this time, request A writes the data with the age of 20 read from the database into the cache.

picture

Ultimately, the user age is 20 (old value) in the cache and 21 (new value) in the database, and the cache and database data are inconsistent.

From the above theoretical analysis, updating the database first and then deleting the cache will also cause the problem of data inconsistency, but in practice, the probability of this problem is not high .

Because the write to the cache is usually much faster than the write to the database , it is difficult in practice to request A to update the cache after request B has updated the database and deleted the cache.

And once request A updates the cache before request B deletes the cache, subsequent requests will re-read the data from the database due to a cache miss, so there is no such inconsistency.

Therefore, the solution of "update the database first and then delete the cache" can ensure data consistency .

Moreover, in order to be foolproof, Awang also added an " expiration time " to the cached data. Even if there is inconsistency in the cached data during this period, there is an expiration time to get the bottom line, so that the final consistency can be achieved.

After thinking about this step, Awang felt that he was really a little genius, because he actually thought of a "seamless" plan, he adopted this plan without saying a word, and after a few days of tossing, it was finally completed.

He confidently reported to the boss that he had resolved the last customer's complaint. The boss thinks that Ngawang is a good guy. He solved the problem so quickly, and then let Ngawang observe for a few days.

How can things go so smoothly? As a result, it didn't take long for the boss to receive another complaint from the customer, saying that he had clearly updated the data, but the data would take a while to take effect , and the customer could not accept it.

The boss looked for Awang with a blank face, and asked Awang to find out the problem as soon as possible.

Awang was even more panicked when he learned that there was another bug. He immediately logged in to the server to troubleshoot the problem. After checking the log, he learned the reason.

"Update the database first, then delete the cache" is actually two operations. All the previous analyses are based on the fact that these two operations can be executed successfully at the same time. The problem of this customer complaint is that the cache is deleted at **** (the second operation) failed, causing the data in the cache to be the old value .

Fortunately, an expiration time has been added to the cache before, so the phenomenon that the customer said that the update will take effect after a period of time occurs. Assuming that there is no such thing as the expiration time, the subsequent requests will always read in the cache. Old data, so the problem is bigger.

So a new question comes, how to ensure that the two operations of "update the database first, and then delete the cache" can be successfully executed?

After analyzing the problem, Awang reported the problem to the boss in a panic.

After the boss knew about the matter, he gave Ngawang a few more days to solve the problem, and the matter of painting cakes was not mentioned again this time.

How will Awang solve this problem?

Can the cake thing drawn by the boss be fulfilled to Ngawang?

Predict the future, and listen to Ngawang's story next time.

picture

summary

That's it for Ngawang, let's talk about something else.

Although the solution of "update the database first, then delete the cache" ensures the data consistency between the database and the cache, but every time the data is updated, the cached data will be deleted, which will affect the cache hit rate.

Therefore, if our business has high requirements on the cache hit rate, we can adopt the "update database + update cache" solution, because updating the cache will not cause cache misses .

However, we have also analyzed this solution before. When two update requests are executed concurrently, there will be a data inconsistency problem, because the two operations of updating the database and updating the cache are independent, and we do not do any concurrency control on the operation. , then when two threads update them concurrently, data inconsistency will be caused by the difference in the order of writing.

So we have to add some means to solve this problem, here are two approaches:

  • Before updating the cache, add a distributed lock to ensure that only one request to update the cache is run at the same time, and there will be no concurrency problems. Of course, after the lock is introduced, it will affect the performance of writing.
  • When the cache is updated, a short expiration time is added to the cache , so that even if the cache is inconsistent, the cached data will expire quickly, which is still acceptable to the business.

By the way, the solution to cache inconsistency caused by concurrent “read + write” requests for the “delete the cache first, then delete the database” scheme is “ delayed double deletion ”.

The pseudo code of the delayed double deletion implementation is as follows:

#删除缓存
redis.delKey(X)
#更新数据库
db.update(X)
#睡眠
Thread.sleep(N)
#再删除缓存
redis.delKey(X)

A sleep time is added, mainly to ensure that when request A is sleeping, request B can complete the operation of "reading data from the database, and then writing the missing cache to the cache" during this period, and then requesting A to complete the sleep. , and then delete the cache.

Therefore, the sleep time of request A needs to be greater than the time for request B to "read data from the database + write to the cache".

However, the specific sleep time is actually a metaphysics , and it is difficult to evaluate, so this solution is only to ensure consistency as much as possible. In extreme cases, there will still be cache inconsistencies.

Therefore, it is more recommended to use the "update the database first, then delete the cache" solution.


Past situation review

Last time, programmer Awang introduced Redis as the MySQL cache layer in order to improve the performance of data access, but this is not so simple, because the problem of double write consistency between Redis and MySQL must be considered.

After a lot of setbacks, Awang finally chose the strategy of " update the database first, then delete the cache ", because this strategy can maximize data consistency even when reading and writing concurrently.

The smart Awang also came up with a bottom-up solution, which is to add an expiration time to the cache.

I thought that there would be no problem of data consistency in this way. As a result, after the function was launched, the boss still received a complaint from the user "that he clearly updated the data, but the data will take effect after a period of time", and the customer could not accept it.

The boss told Awang, who panicked even more when he learned that there was another bug, and immediately logged into the server to troubleshoot the problem. After checking the log, he learned the reason.

"Update the database first, then delete the cache" is actually two operations. The problem with the customer's complaint this time is that the deletion of the cache (the second operation) failed, resulting in the data in the cache being the old value, while the database is latest value .

Fortunately, an expiration time has been added to the cache before, so the phenomenon that the customer said that the update will take effect after a period of time occurs. Assuming that there is no such thing as the expiration time, the subsequent requests will always read in the cache. Old data, so the problem is bigger.

So a new question comes, how to ensure that the two operations of "update the database first, and then delete the cache" can be successfully executed?

After analyzing the problem, Awang reported the problem to the boss in a panic.

After the boss knew about the matter, he gave Ngawang a few more days to solve the problem, and the matter of painting cakes was not mentioned again this time.

  • How will Awang solve this problem?
  • Can the cake thing drawn by the boss be fulfilled to Ngawang?

How to ensure that both operations can be executed successfully?

The user's complaint this time is because the deletion of the cache (the second operation) failed, resulting in the old value of the cache, while the database is the latest value, causing the problem of inconsistency between the database and the cached data, which will affect sensitive business.

For example, to illustrate.

The application needs to update the value of data X from 1 to 2, first successfully update the database, and then delete the cache of X in the Redis cache, but this operation fails. At this time, the new value of X in the database is 2, and the value in Redis is 2. The cache value of X is 1, and there is a problem of inconsistency between the database and the cached data.

picture

Then, if there is a subsequent request to access data X, it will be queried in Redis first. Because the cache is not deleted, it will hit the cache, but the old value of 1 will be read.

In fact, whether you operate the database first or the cache first, as long as the second operation fails, there will be a data consistency problem.

The cause of the problem is known, how to solve it? There are two ways:

  • retry mechanism.
  • Subscribe to MySQL binlog, and then operate the cache.

Let's talk about the first one.

retry mechanism

We can introduce a message queue , add the data to be operated by the second operation (deleting the cache) to the message queue, and let consumers operate the data.

  • If the application fails to delete the cache , it can re-read the data from the message queue, and then delete the cache again. This is the retry mechanism . Of course, if the retry exceeds a certain number of times and still fails, we need to send an error message to the business layer.
  • If the deletion of the cache is successful , the data must be removed from the message queue to avoid repeated operations, otherwise continue to retry.

Take an example to illustrate the process of the retry mechanism.

picture

Subscribe to MySQL binlog, and then operate the cache

The first step of the strategy of " update the database first, then delete the cache " is to update the database. If the database is updated successfully, a change log will be generated and recorded in the binlog.

So we can get the specific data to be operated by subscribing to the binlog log, and then perform cache deletion. Alibaba's open source Canal middleware is based on this implementation.

Canal simulates the interactive protocol of MySQL master-slave replication, disguises itself as a MySQL slave node, and sends a dump request to the MySQL master node. After MySQL receives the request, it will start to push Binlog to Canal. After Canal parses the Binlog byte stream, Converted to readable structured data for downstream program subscriptions.

The following diagram is how Canal works:

picture

Therefore, if we want to ensure that the second operation of the "update the database first, then delete the cache" strategy can be successfully executed, we can use "message queue to retry the deletion of the cache", or "subscribe to MySQL binlog and then operate the cache", these two These methods have a common feature, they all use the asynchronous operation cache.

The boss is making a cake

Because Awang is familiar with message queues, he decided to use the "message queue to retry cache deletion" scheme to solve this user problem.

After several days and nights of operation, the server is done, and I immediately report to the boss.

The boss asked Ngawang to observe for some more time. If there is no problem, he will discuss the matter of "cake" during the Mid-Autumn Festival.

Time flies, and the Mid-Autumn Festival is here. During this period, there has been no problem of inconsistent user feedback data.

The boss saw that Ngawang performed very well this time, there were no more mistakes, and the access performance of the server also improved, so he sent Ngawang this super large moon cake. You can see that this cake is big and round, just like your code Long and more.

picture

When Nga Wang saw this moon cake, he couldn't help laughing. He didn't expect it to be a cake drawn by the boss. It was a really big cake. . . .

The above story is purely fictitious, if there is any coincidence, it is up to you.

Guess you like

Origin blog.csdn.net/qq_34827674/article/details/123866483