foreword

In my work, I found that there are occasional data inconsistencies between redis and mysql. In fact, I have not encountered it in my work. I just saw another colleague mentioning this problem before, so I read our handling of redis data consistency. It is found that the processing method is simple and rough, just take it from the cache, check the mysql and return it if it is not, and then update the cache asynchronously. This solution will inevitably cause data inconsistency in the case of high concurrency. As for the reason, I will elaborate on it. But because our concurrency is very low, there have been no problems.
I read the solution proposed by my colleague who talked about this problem and the reference blog mentioned below in this article. The length is a bit long, and it feels too complicated to me. Based on my own understanding, I will summarize it. As for the details, I will practice it later when I have time. Of course it’s not right, just let the horse come here~

1. Delete cache or update cache

The reason for using redis is well known. It reduces the pressure on the database and improves system performance. However, only redis data cannot meet the increasingly complex business. It needs to be stored together with mysql. When accessing data, first access redis, in redis No more access to mysql. At this time, the consistency of data in redis and mysql needs to be considered.
The so-called consistency, I roughly understand that at almost the same time, the data in the cache accessed by the customer is the same as the data in mysql, but the data in redis and mysql must always be processed in a sequential order, so the data is consistent To be specific, it is necessary to consider whether to process redis or the database first, and whether the data in redis is deleted or updated.

Conclusion: delete cache instead of update cache
Since updating redis is much slower than deleting redis, choose to delete the cache.
(Of course, there are concurrency issues besides the time issue. I won’t go into details here. If you are interested, you can read this article )

2. Update the database first or delete the cache first

So whoever processes redis and database first, should update the database first or delete the cache first?

2.1 Delete the cache first, then update the database

If you delete the cache first, and then update the database, the cache has been deleted, but the database has not been updated. As a result, other threads access redis, the cache data is empty, and the database has not been updated, which will cause the requesting thread to get If the latest data is not available, the old data can only be fetched from the database and synchronized to redis, and the update time of the database is much longer than the time of deleting redis, which will lead to longer time in dirty data;

2.2 Update the database first, then delete the cache

Use to update the database first, then delete the cache. After the database update is completed, delete the cache, resulting in a shorter time for dirty data. If thread A is updating the data, there is still cached data at this time. After thread A finishes updating mysql, delete the cache. At this time, the B thread preempts the CPU to access, and finds that there is no data in redis, and it will synchronize the data from mysql to redis, so the time to delete the cache is very short, which will make the time to generate dirty data very short.
Conclusion: update the database first, then delete the cache
So we choose to update the database first and then update the cache, although this will still lead to data inconsistency.
(Of course, in addition to the time problem, there are still concurrency problems. I won’t go into details here. If you are interested, you can read this article )

3. After the database update is completed, what should I do if the redis service is down?

After the previous operations, how to ensure that the redis cache data can be deleted after the database is updated?
What if the redis service goes down after mysql is updated, isn’t it still inconsistent?
Retry? How many times can you retry, and how often should you retry? These are not easy to define, so how can we ensure that the redis cache data will be deleted? At this time, the message queue can come on stage, decoupling through MQ, put the keys to be deleted in the message queue, and consume them by consumers. Only after consumption, the message of the task of deleting the cache does not exist, even if Redis is down, but if the message queue server is still standing, as long as there is no consumption, the deleted message will still be there. When the redis service is restored, start to consume again, so that the deletion operation will be executed, you may Will say what if the MQ server is also up? I think there are always more solutions than problems, and a solution can always be found. Make up your own brain~~
Conclusion: Using Message Queues
(You can also read this article again )

4. The concept of flying in the sky requires the practice of running on the ground

This kind of data inconsistency problem must have been paid attention to by those companies with high concurrency, and they also have their own set of landing solutions, such as Ali's open source Canal .
Once the mysql data is changed and submitted, a message will be generated if there is a change in the binlog. After the consumer listens to the message, it will perform subsequent processing, such as deleting the cache. This is in line with the previous analysis, update the data first and then delete the cache. Of course, in order to prevent mysql from being changed again, the consumer can check the database again after receiving the message. If there is indeed a change, the data will be synchronized to the cache.
Conclusion: Solve the data consistency scheme using Canal
Here is the address of canal: https://github.com/alibaba/canal
If you are interested, you can study it. We are now using canal to make Mysql and Mongo, and the data consistency between Mysql and ES does not have to wait for use When the time comes, we will synchronize the logic layer by layer to the cache. We have already prepared the latest data, and it is really delicious to check directly. Of course,
when the business is more complicated, we need to consider more. This requires continuous practice and trial Only by making mistakes can you find a solution that suits your business. After all, technology serves the business and business drives technology.

------------The more you know, the more you don't know-------------

Thoughts on data consistency between redis and mysql

This article directory