I still don’t understand the data consistency between Redis and MySQL in the interview, just read this article

1. What is database and cache consistency

Data consistency refers to:

  • There is data in the cache, and the cached data value = the value in the database;

  • The data is not in the cache, the value in the database = latest value.

The reverse push cache is inconsistent with the database:

  • The cached data value ≠ the value in the database;

  • There is old data in the cache or database, causing the thread to read the old data.

Why do data consistency problems arise?

When using Redis as a cache, when the data changes, we need to double write to ensure that the cache is consistent with the data in the database.

Database and cache, after all, are two systems. If you want to ensure strong consistency, you must introduce  2PC or  Paxos wait for distributed consistency protocols, or distributed locks, etc. This is difficult to implement and will definitely affect performance. Influence.

If the requirements for data consistency are really so high, is it really necessary to introduce caching?

2. Cache usage strategy

When using cache, there are usually the following cache usage strategies to improve system performance:

  • Cache-Aside Pattern(Bypass cache, commonly used in business systems)

  • Read-Through Pattern

  • Write-Through Pattern

  • Write-Behind Pattern

2.1 Cache-Aside (bypass cache)

The so-called "bypass cache" means that the operations of reading the cache, reading the database, and updating the cache are all completed in the application system , which is the most commonly used caching strategy for business systems .

2.1.1 Read data

picture

The logic of reading data is as follows:

  1. When the application needs to read data from the database, it first checks whether the cache data hits.

  2. If the cache misses, query the database to obtain the data, and write the data to the cache at the same time, so that subsequent reads of the same data will hit the cache, and finally return the data to the caller.

  3. If the cache hits, return directly.

The timing diagram is as follows:

picture

Bypass cache read timing diagram

advantage

  • Only the data actually requested by the application is included in the cache, helping to keep the cache size cost-effective.

  • It is simple to implement and can achieve performance improvements.

The pseudocode of the implementation is as follows:

String cacheKey = "公众号:码哥字节";
String cacheValue = redisCache.get(cacheKey);
//缓存命中
if (cacheValue != null) {
  return cacheValue;
} else {
  //缓存缺失, 从数据库获取数据
  cacheValue = getDataFromDB();
  // 将数据写到缓存中
  redisCache.put(cacheValue)
}

shortcoming

Since the data is only loaded into the cache after a cache miss, there is some overhead to the response time of the data request for the initial call due to the additional cache population and database query time required.

2.1.2 Update data

When using  cache-aside the mode to write data, the flow is as follows.

picture

Bypass cache write data

  1. write data to the database;

  2. Invalidate the data in the cache or update the cached data;

When used  cache-aside , the most common write strategy is to write data directly to the database, but the cache may become inconsistent with the database.

We should set an expiration time for the cache, which is a solution to ensure eventual consistency.

If the expiration time is too short, the application will constantly query the database for data. Likewise, if the expiration time is too long and the cache is not invalidated when the update occurs, the cached data is likely to be dirty.

The most common way is to delete the cache to invalidate the cached data .

Why not update the cache?

performance problem

When the update cost of the cache is high, and multiple tables need to be accessed for joint calculation, it is recommended to delete the cache directly instead of updating the cache data to ensure consistency.

Security Question

In a high-concurrency scenario, the data found in the query may be old values, which will be analyzed later by Code Brother, so don’t worry.

2.2 Read-Through (direct reading)

When the cache misses, the data is also loaded from the database, written into the cache and returned to the application system.

Although  read-through very  similar  to cache-aside ,  the application system is responsible for fetching data from the database and populating the cache.cache-aside

Read-Through, on the other hand, shifts the responsibility of fetching the value in the data store to the cache provider.

picture

Read-Through

Read-Through implements the separation of concerns principle. The code only interacts with the cache, and the cache component manages the data synchronization between itself and the database.

2.3 Write-Through Synchronous Direct Write

Similar to Read-Through, when a write request occurs, Write-Through transfers the writing responsibility to the cache system, and the cache abstraction layer completes the update of cached data and database data. The timing flow chart is as follows :

picture

Write-Through

Write-Through The main advantage is that the application system does not need to consider fault handling and retry logic, and it is handed over to the cache abstraction layer to manage the implementation.

Advantages and disadvantages

It is meaningless to use this strategy directly, because this strategy needs to write to the cache first, and then write to the database, which brings additional delay to the write operation.

When used Write-Through in  Read-Through conjunction with , it can give full play to its  Read-Through advantages while ensuring data consistency without considering how to invalidate cache settings.

picture

Write-Through

This strategy reverses  Cache-Aside the order of filling the cache. Instead of delaying loading to the cache after a cache miss, the data is written to the cache first, and then the cache component writes the data to the database .

advantage

  • Cache and database data are always up to date;

  • Query performance is the best, because the data to be queried may have already been written to the cache.

shortcoming

Infrequently requested data is also written to the cache, resulting in a larger and more expensive cache.

2.4 Write-Behind

This picture seems to be the  Write-Through same at first glance, but it is not. The difference is the arrow of the last arrow: it changes from solid to line.

This means that the cache system will update the database data asynchronously, and the application system only interacts with the cache system .

Applications do not have to wait for database updates to complete, improving application performance because updates to the database are the slowest operations.

picture

Write-Behind

Under this strategy, the consistency between the cache and the database is not strong, and it is not recommended for systems with high consistency.

3. Analysis of consistency problems under bypass cache

The (bypass caching) strategy is most commonly used in business scenarios  Cache-Aside . Under this strategy, the client reads the data from the cache first, and returns if it hits; if it misses, it reads from the database and writes the data To the cache, so read operations will not cause inconsistencies between the cache and the database.

The focus is on write operations. Both the database and the cache need to be modified, and there will be a sequence between the two, which may cause the data to no longer be consistent . For writing, we need to consider two issues:

  • Update the cache first or update the database?

  • When the data changes, choose to modify the cache (update) or delete the cache (delete)?

Combining these two questions, four solutions emerge:

  1. Update the cache first, then update the database;

  2. Update the database first, then update the cache;

  3. Delete the cache first, then update the database;

  4. Update the database first, then delete the cache.

In the following analysis, you don’t have to memorize it by rote. The key is that you only need to consider whether the following two scenarios will cause serious problems during the deduction process:

  • Where the first operation succeeds and the second fails, what would cause the problem?

  • Will it cause inconsistency in reading data under high concurrency?

Why not consider the first failure and the second success?

you guess?

Since the first one fails, there is no need to execute the second one, just return 50x and other abnormal information in the first step, and there will be no inconsistency.

Only when the first one succeeds, the second failure is a headache. To ensure their atomicity, it involves the scope of distributed transactions.

3.1 Update the cache first, then update the database

picture

Update the cache first and then update the database

If the update of the cache is successful first, but the write to the database fails, the cache will contain the latest data and the database will contain old data, then the cache will be dirty data.

After that, when other queries come in immediately, this data will be obtained, but this data does not exist in the database.

For data that does not exist in the database, it is meaningless to cache and return to the client.

The program is straightforward  Pass.

3.2 Update the database first, then update the cache

Everything works as follows:

  • Write the database first, success;

  • Then update the cache, success.

Failed to update cache

At this time, let's infer that if the atomicity of these two operations is broken: what problems will result if the first step succeeds and the second step fails ?

It will cause the database to be the latest data and the cache to be old data, resulting in consistency problems.

I will not draw this picture, it is similar to the previous picture, just switch the positions of Redis and MySQL.

High Concurrency Scenario

Xie Bage often 996, his back is sore and his neck hurts, and he writes more and more bugs. He wants to go for a massage to improve his programming skills.

Affected by the epidemic, the order is hard to come by, and the technicians in the high-end clubs are scrambling to take this order, high concurrency, brothers.

After entering the store, the front desk will enter the customer information into the system.  set xx的服务技师 = 待定The initial value of the execution indicates that there is currently no reception and save it in the database and cache, and then arrange the massage service of the technician.

As shown below:

picture

High concurrency updates the database first, then updates the cache

  1. Technician No. 98 acted first and wrote  set 谢霸歌的服务技师 = 98 the command sent to the system into the database. At this time, the network of the system fluctuated and froze, and the data had not had time to be written to the cache .

  2. Next, technician No. 520 also sent to the system  set 谢霸哥的服务技师 = 520to write to the database, and also wrote this data to the cache.

  3. At this time, the write cache request of technician No. 98 started to be executed, and the data was successfully  set 谢霸歌的服务技师 = 98 written into the cache.

Finally, it was found that the value of the database =  set 谢霸哥的服务技师 = 520, and the value of the cache =  set 谢霸歌的服务技师 = 98.

Technician 520's latest data in the cache is overwritten by Technician 98's old data.

Therefore, in a high-concurrency scenario, if multiple threads write data at the same time and then write to the cache, there will be inconsistencies where the cache is the old value and the database is the latest value.

The program passes directly.

If the first step fails, a 50x exception will be returned directly, and there will be no data inconsistency.

3.3 Delete the cache first, then update the database

According to the routine mentioned by "Brother Code", assuming that the first operation is successful and the second operation fails, what will happen? What happens in high concurrency scenarios?

Failed to write to the database in the second step

Suppose there are two requests now: write request A and read request B.

The first step of write request A is to delete the cache successfully, but fail to write data to the database, which will lead to the loss of the written data, and the database will save the old value .

Then another read request B comes in and finds that the cache does not exist, reads the old data from the database and writes it to the cache.

Problems under high concurrency

picture

Delete the cache first, then write the database

  1. It is better for technician No. 98 to act first. The system receives a request to delete the cached data. When the system is about to  set 肖菜鸡的服务技师 = 98write to the database, it freezes, and it is too late to write.

  2. At this time, the lobby manager executes a read request to the system to check whether Xiao Caiji has a technician reception, so as to arrange technician service. The system finds that there is no data in the cache, so it reads the old data from the database and writes it into the cache  set 肖菜鸡的服务技师 = 待定.

  3. At this time, the operation of writing data to the database by Technician No. 98 in the original freeze  set 肖菜鸡的服务技师 = 98is completed.

In this way, the old data will be cached, and the latest data cannot be read before the cache expires. Xiao Caiji had already been accepted by Technician No. 98, but the lobby manager thought no one was there.

This scheme passes, because the first step is successful and the second step fails, the database will contain old data, and there will be no data in the cache. Continue to read old values ​​from the database and write them into the cache, resulting in data inconsistency, and one more cahche.

Whether it is an abnormal situation or a high-concurrency scenario, it will lead to data inconsistency. miss.

3.4 Update the database first, then delete the cache

After passing the previous three plans, all of them were passed. Let’s analyze whether the final plan will work or not.

According to the "routine", respectively judge what problems will be caused by abnormality and high concurrency.

This strategy can know that if the phase of writing to the database fails, an exception will be returned to the client, and there is no need to perform caching operations.

Therefore, if the first step fails, there will be no data inconsistency.

Failed to delete cache

The point is that the first step is to write the latest data to the database successfully, but what should I do if the cache fails to be deleted?

You can put these two operations in one transaction, and when the cache deletion fails, roll back the write to the database.

It is not suitable for high-concurrency scenarios, and large transactions are prone to occur, causing deadlock problems.

If you don't roll back, it will appear that the database is new data, the cache is still old data, and the data is inconsistent. What should I do?

Therefore, we have to find a way to make the cache deletion successful, otherwise we can only wait until the validity period expires.

Use a retry mechanism.

For example, retry three times, if it fails all three times, record the log to the database, and use the distributed scheduling component xxl-job to implement subsequent processing.

In high-concurrency scenarios, it is best to use asynchronous methods for retrying , such as sending messages to mq middleware to achieve asynchronous decoupling.

Or use the Canal framework to subscribe to the MySQL binlog log, monitor the corresponding update request, and delete the corresponding cache operation.

High Concurrency Scenario

Let's analyze the problems of high concurrent reading and writing...

picture

Write to the database first and then delete the cache

  1. Technician No. 98 acted first, and took over Xiao Caiji's business,  set 肖菜鸡的服务技师 = 98and the database executed it; the network was still stuck, and he didn't have time to execute the delete cache operation .

  2. Candy, the supervisor, executes a read request to the system, checks to see if Xiao Caiji is received by a technician, finds that there is data in the cache  肖菜鸡的服务技师 = 待定, and directly returns the information to the client. The supervisor thinks that there is no one to receive it.

  3. Originally, Technician No. 98 took the order, but the cache was not deleted due to the freeze, and now the deletion is successful.

A small amount of old data may be read in a read request, but the old data will be deleted soon, and subsequent requests can obtain the latest data, which is not a big problem.

There is also a more extreme situation. When the cache is automatically invalidated, it encounters a high concurrent read and write situation. Suppose there are two requests, one thread A does the query operation, and the other thread B does the update operation, then there will be the following situation produce:

picture

cache invalidation

  1. The cache expiration time expires and the cache becomes invalid.

  2. Thread A reads the request to read the cache and misses, then queries the database to get an old value (because B will write a new value, relatively speaking, it is an old value), and when the data is written to the cache, the sending network problem is stuck .

  3. Thread B performs a write operation, writing the new value to the database.

  4. Thread B executes delete cache.

  5. Thread A continues, wakes up from the freeze, and writes the old value queried into the cache.

Brother code, how to play this, there is still an inconsistency.

Don't panic, the probability of this happening is very small. The necessary conditions for the above situation to happen are:

  1. The write database operation in step (3) takes less time and is faster than the read operation in step (2), so that step (4) may precede step (5).

  2. The cache has just reached its expiration date.

Usually the QPS of MySQL stand-alone is about 5K, and the TPS is about 1k (ps: Tomcat's QPS is about 4K, TPS = about 1k).

Database read operations are much faster than write operations (it is precisely because of this that read-write separation is done), so it is difficult for step (3) to be faster than step (2), and it is also necessary to cooperate with the cache failure.

Therefore, when using the bypass cache strategy, it is recommended to use it for write operations: update the database first, and then delete the cache.

4. What are the consistency solutions?

Finally, for the Cache-Aside (bypass cache) strategy, when the write operation is used to update the database first, and then delete the cache, let's analyze what are the data consistency solutions?

4.1 Cache delay double deletion

How to avoid dirty data if you delete the cache first and then update the database?

A delayed double-delete strategy is adopted.

  1. Delete the cache first.

  2. Write to the database.

  3. Sleep for 500 milliseconds before deleting the cache.

In this way, there will only be a dirty data read time of up to 500 milliseconds. The key is how to determine the sleep time?

The purpose of the delay time is to ensure that the read request ends, and the write request can delete the cache dirty data caused by the read request.

Therefore, we need to evaluate the time consumption of the project's data reading business logic by ourselves, and add a few hundred milliseconds as the delay time on the basis of the reading time consumption .

4.2 Delete cache retry mechanism

What should I do if cache deletion fails? For example, if the second deletion of delayed double deletion fails, it means that dirty data cannot be deleted.

Use the retry mechanism to ensure that the cache is deleted successfully.

For example, retry three times, if it fails all three times, the log will be recorded to the database and a warning will be sent for manual intervention.

In high-concurrency scenarios, it is best to use asynchronous methods for retrying , such as sending messages to mq middleware to achieve asynchronous decoupling.

picture

retry mechanism

Step (5) If the deletion fails and the maximum number of retries has not been reached, the message will be re-enqueued until the deletion is successful, otherwise it will be recorded in the database and manually intervened.

This solution has a disadvantage, which is the intrusion into the business code, so there is the next solution, which is to start a service that specifically subscribes to the database binlog to read the data that needs to be deleted and perform the cache deletion operation.

4.3 Read binlog and delete asynchronously

picture

Binlog deletes asynchronously

  1. update the database;

  2. The database will record the operation information in the binlog log;

  3. Use canal to subscribe to binlog logs to obtain target data and keys;

  4. The cache deletion system obtains canal data, analyzes the target key, and tries to delete the cache.

  5. If the deletion fails, the message is sent to the message queue;

  6. The cache deletion system obtains data from the message queue again, and performs the deletion operation again.

Summarize

The best practice for caching strategy is  Cache Aside Pattern. They are divided into read caching best practices and write caching best practices.

The best practice of read cache : read the cache first, return if it hits; query the database if it misses, and then write to the cache.

Write caching best practices:

  • Write to the database first, then operate the cache;

  • Delete the cache directly instead of modifying it, because when the update cost of the cache is very high and you need to access multiple tables for joint calculation, it is recommended to delete the cache directly instead of updating it. In addition, the operation of deleting the cache is simple, and the side effect is only to increase a chache miss. It is recommended Everyone uses this strategy.

Under the above best practices, in order to ensure the consistency of the cache and the database as much as possible, we can use delayed double deletion.

To prevent deletion failure, we use an asynchronous retry mechanism to ensure correct deletion. With the asynchronous mechanism, we can send a deletion message to the mq message middleware, or use canal to subscribe to the MySQL binlog log to listen for write requests and delete the corresponding cache.

So, what if I have to ensure absolute consistency, let me give a conclusion first:

There is no way to achieve absolute consistency, which is determined by the CAP theory. The scenario where the cache system is applicable is the scenario of non-strong consistency, so it belongs to the AP in CAP.

Therefore, we have to compromise, and we can achieve the final consistency mentioned in the BASE theory .

In fact, once the cache is used in the solution, it often means that we give up the strong consistency of the data, but it also means that our system can get some improvement in performance.

The so-called tradeoff is exactly that.

If you feel that this article is helpful to you, please like it and follow it to support it. If you want to know more about Java backend, big data, and the latest information in the field of algorithms, you can follow my public account [Architect Lao Bi] private message 666 to get more Java post Terminal, Big Data, Algorithm PDF + Dachang latest interview questions sorting + video lecture

Guess you like

Origin blog.csdn.net/Javatutouhouduan/article/details/131975559