Interview: How to ensure the consistency of Redis cache and Mysql database

foreword

Why is there a problem with caching and data consistency?

For hot data (data that is frequently queried but not frequently modified), we can put it in the redis cache , because if we use Mysql, the DB cannot handle it. Therefore, cache middleware is used to increase query efficiency, but it is necessary to ensure that the read data in Redis is consistent with the data stored in the database.

The client mainly performs two operations of reading and writing to the database. For the hot data cached in redis, when the client wants to read the data, it returns the data directly in the cache, that is, the cache hits. When the read data is not in the cache, it needs to read the data from the database into the cache, that is Cache miss. We can see that the read operation does not cause data inconsistency between the cache and the database .

text:

1. Common solutions

Usually, our main purpose of using caching is to improve query performance . In most cases, we use the cache like this:
insert image description here

  • This is the most common usage of caching. It looks fine, but you have overlooked a very important detail: **If a piece of data in the database is updated immediately after being put into the cache, how to update the cache? **Because the cache is not updated, the old data will be read when the next read hits the cache.
  • Currently, there are four main options for updating the cache:
  1. Write the cache first, then update the database
  2. Update the database first, in the write cache (double write)

It is recommended to deal with cache consistency:
3. Delete the cache first, then update the database
4. Write to the database first, then delete the cache

Write to the cache first, then write to the database

insert image description here
Let's think about it, if we just finished writing the cache after each write operation, suddenly the network is not good and the database write fails.
insert image description here
**The cache is updated to the latest data, but the database is not, so the data in the cache will become dirty data? **If the user's query request just reads the data at this time, there will be a problem, because the data does not exist in the database at all, and this problem is very serious.
We all know that the main purpose of caching is to temporarily store the data in the database in memory, which is convenient for subsequent queries and improves query speed.
But if a piece of data doesn't exist in the database, what's the point of caching this "fake data"?

therefore,It is not advisable to write the cache first and then write the database, not used much in practice

Update the database first before updating the cache

insert image description here
For the user's write operation, write to the database first, and then write to the cache, which can avoid the previous "false data" problem. But it created new problems .
What's the problem?

  • In high-concurrency business scenarios, both writing to the database and writing to the cache are remote operations. In order to prevent deadlocks caused by large transactions, it is usually recommended not to put the writing to the database and write to the cache in the same transaction. ==That is to say, in this scheme, if the database succeeds but the write cache fails, the data written in the database will not be rolled back.
  • There will be new data in the database, and the cache is a problem with old data
    insert image description here
  1. Ask a to come first, just finished writing the database. However, due to network reasons, it froze for a while, and I haven't had time to write the cache.
  2. At this time, request b came over, and the database was written first.
  3. Next, request b successfully writes the cache.
  4. At this point, the request a is stuck and the cache is also written.
    Obviously, during this process, the new data of request b in the cache is overwritten by the old data of request a.
    That is to say: in a high-concurrency scenario, if multiple threads execute the operation of writing to the database first and then writing to the cache at the same time, the database may contain new values ​​while the cache contains old values, and the data on both sides may be inconsistent.

From the above, we can see that writing the database first before writing the cache is a waste of system resources , and it is not recommended to use

Scenarios used to update the database update cache

If our business has high requirements on the cache hit rate , we can adopt the "update database + update cache" solution, because updating the cache will not cause a cache miss.

solution

  • Add a distributed lock before updating the cache to ensure that only one request is running at the same time to update the cache , and there will be no concurrency problems. Of course, after the lock is introduced, it will have an impact on the performance of writing.
  • After updating the cache, add a short expiration time to the cache, so that if the cache is inconsistent, the cached data will expire soon , which is still acceptable to the business
  • Through the above two double writes, we can know that if you directly update the cache, there are many problems, so let's change our thinking and update the cache -> delete the cache .

Delete the cache first, then update the database

insert image description here

  • under high concurrency
    insert image description here

  • Thread A deletes the cache, but the operation of updating the database has not been completed at this time. At this time, thread B reads the cache and finds that there is no data in the cache, so it reads the old value of the database and updates it to the cache. At this time, thread A is updated. Write the new value to the database. How to solve the problem of data inconsistency in this scenario? .

Solution, delayed double delete

Thread A deletes the cache and is updating the database. At this time, the update operation of A has not been completed, and thread B reads the cache and finds that there is no cache. It reads the database, reads the old value, and then writes the old value into the cache. After the A thread sleeps to the B thread writes to the cache, it performs the delete cache operation. When other threads come to read, the database is up to date.
insert image description here

If the deletion of the cache fails for the second time, the retry mechanism of the message queue can be used.

If the second deletion fails, a retry mechanism is adopted

  • Schematic diagram of the retry mechanism
    insert image description here

Update the database first, then delete the cache

insert image description here

  • This is more obvious. If the A thread database update is successful, but the cache fails, or it has not been deleted in the future, then the B thread will read the old value at this time, and it will still be inconsistent.
    insert image description here

Solutions (retry and binlog):

  • Message queue
    We can introduce the message queue bold style ,Add the data to be operated by the second operation (delete cache) to the message queue, and the consumer will operate the data

If the application fails to delete the cache , it can re-read the data from the message queue, and then delete the cache again. This is the retry mechanism . Of course, if the retry exceeds a certain number of times and still fails, we need to send an error message to the business layer.
If the deletion of the cache is successful, the data must be removed from the message queue to avoid repeated operations, otherwise continue to retry.

  • Subscribe to Mysql binLog, in operation cache

The first step in the strategy of "update the database first, then delete the cache" is to update the database. If the database is successfully updated, a change log will be generated and recorded in the binlog.

So we can obtain the specific data to be operated by subscribing to binlog logs, and then perform cache deletion. Alibaba's open source Canal middleware is based on this implementation.

**Canal simulates the interactive protocol of MySQL master-slave replication, disguises itself as a MySQL slave node, and **sends a dump request to the MySQL master node. After receiving the request, MySQL will start pushing Binlog to Canal , and Canal will parse the Binlog word After throttling, it is converted into structured data that is easy to read for subscription by downstream programs.
insert image description here
Therefore, if we want to ensure that the second operation of the "update the database first, then delete the cache" strategy can be successfully executed, we can use "message queue to retry the deletion of the cache", or "subscribe to MySQL binlog and then operate the cache". The two methods have a common feature, they all use asynchronous operation cache

interview version answer

1. Update the database, manually clear the Redis cache, and re-query the latest data to synchronize to Redis.
2. Update the Mysql database, and use the MQ asynchronous form to synchronize data to Redis. The advantage is decoupling, and the disadvantage is that the probability of delay is high.
3. Update the database, and the binlog log in the subscription-based database is synchronized to Redis in the form of MQ asynchronous.
4. Subscribe to the Binlog file in mysql and synchronize it to Redis in an asynchronous form (canal framework)

Reference article link

Alibaba Cloud Development and Community: How to ensure double-write consistency of database and cache?
Kobayashi coding: How to ensure consistency of database and cache

Guess you like

Origin blog.csdn.net/weixin_59823583/article/details/129072734