Three solutions to Redis data consistency problems

insert image description here

1. First of all, what is redis

Redis (Remote Dictionary Server) is a high-performance NoSQL open source database based on Key-Value structure storage. Most companies use Redis to implement distributed caching to improve data query efficiency.

2. Why choose Redis

In the early stage of web application development, the access and concurrency of the system are not high, and the interaction is relatively small. However, with the expansion of the business and the increase in the number of visits, the server load and the relational database have bottlenecks, and the source of the bottlenecks is mainly reflected in the disk IO. With the further development of the Internet, there are higher requirements for system performance, and the emergence of Redis has solved many problems. As for why we choose Redis, I summarize the following six reasons:
1) Based on memory storage, it can reduce the frequency of access to relational databases, thereby alleviating database pressure
2) Data IO operations can support higher levels of QPS, The official index is 10W;
3) Provides more data storage structures, such as string, list, hash, set, zset, etc.
4) Using a single thread to implement IO operations, avoiding thread safety issues under concurrency.
5) It can support data persistence and avoid data loss due to server failure.
6) Redis also provides more advanced functions, such as distributed locks, distributed queues, leaderboards, and finding nearby people. More complex requirements provide mature solutions.

3. Application scenarios

Cache, as an in-memory database in the form of Key-Value, the first application scenario that Redis comes to mind is as a data cache
Distributed lock, locks resources in a distributed environment Distributed shared data, shared ranking
among multiple applications
List, self-sorted data structure (zset)
message queue, pub/sub function can also be used as the message of the publisher/subscriber model
insert image description here

4. When redis is used as a cache

4.1. The process of using it as a cache
. Due to its high concurrency and high performance, the cache has been widely used in projects. In terms of reading cache, everyone has no doubts, and they all perform business operations according to the process in the figure below.
insert image description here

4.2. Data consistency issues
For example, when we use Redis as a cache, let the request access Redis first instead of directly accessing the database. In this business scenario, there may be inconsistency between cache and database data.
When updating, operating the cache and database is undoubtedly one of the following four possibilities:
● Update the cache first, then update the database
● Update the database first, then update the cache
● Delete the cache first, then update the database
● Update the database first, then delete the cache
4.2.1, update the cache first, then update the database
insert image description here

If I successfully update the cache, but the server suddenly goes down during the step of updating the database, then at this time, my cache has the latest data and the database has old data.
Dirty data was born because of this, and if the information I cached (is a separate table), and this table is also in the associated query of other tables, then the data from the associated query of other tables is also dirty data, and the result is directly A series of problems will arise.
4.2.2. Update the database first, then update the cache
insert image description here

Only after the cache expires can the correct information be accessed. Then during the time period when the cache has not expired, all you see is dirty data.
In the above two figures, as long as the second step fails, dirty data will inevitably be generated.
4.2.3. Delete the cache first, and then update the database. In
this way, it is possible to maintain data consistency without high concurrency.
insert image description here

If only the first step is executed successfully, but the second step fails, then only the data in the cache is deleted, but the database is not updated, then when the next query is made, the cache cannot be found, and the database can only be re-queried to build the cache , so that data consistency is actually relatively achieved.
However, if it is in the case of concurrent reading and writing, data inconsistency will still occur:
insert image description here

After the execution is completed, it can be clearly seen that the cache built by user 1 is not the latest data, and there are still problems. 4.2.4,
update the database first
, and delete the cache. Then there will be new data in the database, and old data in the cache, and the data will be inconsistent.
insert image description here

As before, what happens in the concurrent case if both pieces of code execute successfully?
insert image description here

It will still cause data inconsistency.
However, it is obviously more difficult to achieve this data inconsistency condition here than in other methods:
● Moment 1: When the read request occurs, the cache just expires
● Moment 2: The read request queries the database before the write request updates the database,
● Moment 3: For write requests, after updating the database, delete the cache before the read request is successfully written to the cache.
This is usually difficult to do, because in real concurrent development, updating the database needs to be locked, otherwise there is no security~ To a certain extent,
this method still solves the problem of data inconsistency to a certain extent of.
4.3. Summary
No matter which method you choose from the above four methods, in the case of multiple services or concurrency, data inconsistency may actually occur.
In order to solve this problem, there are the following methods:
4.3.1. Delayed double deletion
First perform cache clearing, then update, and finally (delay N seconds) and then perform cache clearing. Perform two deletions with a delay in between
insert image description here

public void write(String key,Object data){
    
    
// 延迟双删伪代码
		deleteRedisCache(key);   // 删除redis缓存
		updateMysqlSql(obj);        // 更新mysql
		Thread.sleep(100);           // 延迟一段时间
		deleteRedisCache(key);   // 再次删除该key的缓存
}

The flow chart of delayed double deletion:
insert image description here

In fact, the best way to solve this problem is to sleep for a while after updating the database, and then delete the cache to ensure data consistency
. The time
is the time when user 1 reads data from the database and writes it to the cache.
4.3.2. By sending MQ, synchronize Redis in the consumer thread
insert image description here

Whether it is updating the cache or deleting the cache, when operating the cache and the database at the same time, there is no guarantee that both can be successfully operated at one time, so our best way is to retry. This retry is not an immediate retry, because the cache and database The database may be out of service due to network or other reasons, and the success rate of immediate retry is extremely low, and retry will occupy thread resources, which is obviously unreasonable, so we need to adopt an asynchronous retry mechanism.
We can use message queues to complete asynchronous retry, because message queues can guarantee the reliability of messages, messages will not be lost, and correct consumption can also be guaranteed. If and only when the message consumption is successful, the message will be deleted from the message queue.
Advantage 1: It can greatly reduce the problem of delayed return of the interface.
Advantage 2: MQ itself has a retry mechanism, and there is no need to manually write retry code.
Advantage 3: Decoupling, completely separate query Mysql and synchronous Redis, without interfering with each other
4.3.3 , Canal subscription log implementation
When our business modifies data, we only need to update the database without modifying the cache, so when should we modify the cache?
Taking mysql as an example, a binlog log will be generated when a record in the database is changed. We can subscribe to this kind of news, get specific data, and then update the cache according to the log message. Currently, the popular one for subscribing to logs is Ali’s open source canal , then our architecture becomes as follows.
insert image description here

Subscribe to the database change log. When the database changes, we can get the data of the specific operation, and then delete the corresponding cache according to the specific data.
Of course, Canal also needs to be used together with message queues, because Canal itself has no data processing capabilities.
insert image description here

This method is considered to be completely decoupled, and the application code does not need to worry about the failure to send the message queue, and it is all sent by Canal.

Guess you like

Origin blog.csdn.net/love7489/article/details/130703485
Recommended