How to ensure the consistency of double write buffer to the database

As long as the cache, it may be related to caching and database storage double double write, you write as long as the double, then there will be the issue of consistency of data

So, how to solve the consistency problem?

In general, if the cache can allow a slight occasional inconsistencies with the database, which means that if your system is not strictly required buffer + database must be consistent, it is best not to do this program. Namely: a read and write requests serialized string into a memory queue to go.

Serialization ensures inconsistency will not occur, but it also leads to a significant reduction in system throughput, by several times than normal line of the machine to support the request.

Cache Aside Pattern

The most classic database read and write cache + mode is Cache Aside Pattern.

Read, read-ahead cache, cache not, school database, and then taken out into the data cache, and returns a response. Updated when you update the database, and then delete the cache.

Why delete cache, instead of updating the cache?

The reason is simple, many times, in more complex scenarios cache, the cache is not just the value of the database directly taken out.

For example, a field may update a table, then the corresponding cache, the other two is the need to query data tables and operations, in order to calculate the new value of the cache.

In addition the cost of updating the cache is sometimes very high. Is not always modify the database, be sure to have the corresponding cache update a?

Maybe some scenes like this, but for more complex cache data computing scenario, it is not the case.

If you frequently modify more than one table involved in a cache, cache is updated frequently. But the problem is that the cache in the end will not be frequently accessed?

For chestnut, a cache table fields involved in the modification 1 minute 20 times, or 100 times, then the cache updated 20 times, 100 times; however at one minute the cache is read only 1 time , a large number of cold data.

In fact, if you just delete the cache, then in one minute, but the cache will be recalculated once only, the cost is greatly reduced, the cache was used to count the cache.

In fact, delete the cache, instead of updating the cache is a lazy calculation of thinking, do not always re-do complex calculations, whether it would not be used, but it needs to be used when re-calculation.

Like mybatis, hibernate, have lazy loading thoughts. Query a department, a department with a staff list, no need to say each query departments, 1000 staff are inside the data also check out.

80% of the cases, the investigation department, just to access the information in this sector on it. First check the department, at the same time you want to access the inside of the employees, so this time only when you want to access the inside of the staff, will go inside the database query 1000 employees.

Most primary cache inconsistency problems and solutions

Question: First modify the database, and then delete the cache. If you delete the cache fails, the database will lead to new data, the cache is old data, there have been inconsistencies.

Solutions : delete the cache, and then modify the database. If the database modification fails, then the database is old data, the cache is empty, then the data is not inconsistent.

Because when the cache is not read, then read the old data in the database, and then update the cache.

More complex analysis of data inconsistencies

Data has changed, first delete the cache, and then go to modify the database, not enough time to modify, a request came, read cache, the cache was found empty, to query the database, found the old data before the modification, into the cache . Subsequent data change program finished modifying the database.

Over, and cache data in the database is not the same. . .

Why are millions of traffic at high concurrency scenarios, the cache will appear this problem?

Only when the data for a concurrent read and write, you may face this problem. In fact, if you say the words of concurrency is very low, especially in low concurrent read access per day to 10,000, then a few cases, there will be inconsistent with the kind of scene just described.

But the problem is, if every day is millions of traffic, tens of thousands of concurrent reads per second, per second as long as there is a request to update the data, it may be above + database cache inconsistencies.

Solutions are as follows:

When updating data, based on the unique identification data, after the operation route, to send a jvm internal queue.

When reading data, if the data is not found in the cache, then read the data again to update the cache + operation, after the unique identification in accordance with the route, also sends the same jvm internal queue.

A work queue corresponds to a thread, each worker thread to get the corresponding serial operation, then a one is performed.

In this case, a change of operating data, delete the cache, and then go to update the database, but did not complete the update.

At this time, if a read request over and read empty the cache, the cache updates can first send a request to the queue, this time will be stuck in queue, and then wait for cache synchronization update is complete.

There is a optimization points , a queue, in fact, a plurality of cache update request is meaningless string together, it is possible to make the filter

If you find that the queue has been a request to update the cache, then came alive again would not update the requested operation go directly in front of the waiting for the update operation requests can be completed.

After that queue corresponding work to be done to modify the database thread on one operation at will to perform an action, that is, the cache update operation, then reads the latest values ​​from the database, and then write cache .

If the request waiting time range, to continue the polling can take values ​​found, then return directly; if the waiting time exceeds a certain length of time request, then this time the old value of the current read directly from the database.

Under high concurrency scenarios, the solution is to pay attention to the problem:

1, blocking read request length

As a result of the read request of asynchronous very mild, so be sure to pay attention to the problem of reading timeout, each read request must return within the timeout time.

The solution, the biggest risk is that point, the data may be updated very frequently, resulting in a significant backlog queue update operations on the inside, then read requests will be a lot of time-out occurs, leading to a large number of requests go directly to the database. Always through some simulated real test to see how the frequency of update data.

In addition, because a queue, it may update backlog for multiple data items, hence the need for testing according to their own business situation, you may need to deploy multiple services, each sharing some of the update data.

If a memory queue backlog of 100 goods inventory modify operation, modify the operation of each inventory to spend 10ms to complete, then finally a read request goods, may wait 10 * 100 = 1000ms = 1s, to get the data, this time It will lead to blocking the duration of the read request.

Therefore, the operation must be based on actual business systems, to carry out some stress testing and simulation environment online, to see the busiest time, how much memory queue may update operations backlog, may cause read operation corresponding to the last update request, how much time will hang.

If the read request 200ms return, after if you calculate, even if it is the busiest time, the backlog of 10 update, wait up to 200ms, it is also possible.

If a particularly large number of possible memory queue backlog update, then you have to add the machine, so that service instances deployed on each machine processing less data, each memory queue backlog update operation will be less.

In fact, according to the project prior experience, in general, write frequency data is very low, so in fact it is normal, in the queue backlog update operation should be small.

For highly concurrent read like this, read cache architecture of the project, write requests are generally very small, QPS energy per second to hundreds of pretty good.

The actual rough estimates that, if one second write operation 500 is divided into five time slices, a write operation to 100 per 200ms, into 20 memory queue, each memory queue backlog 5 might write operation.

After each write operation performance testing, usually completed in about 20ms, then for each memory read request queue data, most hang for a while, certainly returned within 200ms.

After just a simple calculation, we know that the single write QPS support of a few hundred is no problem, if you write QPS expanded 10 times, then the expansion of the machine, expansion machine 10 times, each machine 20 queues.

2, a read request is too high concurrency

Here we must also do stress tests to ensure that when it happens to run into the above situation, there is a risk that a sudden large number of read requests in the tens of milliseconds delay hang on the service, see the service can carry the live, need how many machines can Kang Zhu peak maximum limit situation.

But because not all data is updated at the same time, the cache will not fail the same time, so every possible minority that is cached data fails, then those data corresponding to the read request over, concurrency should not be particularly Big.

3, multi-service request routing instances deployed

This service may be deployed in multiple instances, you must ensure that the implementation of data update, as well as a request to perform a cache update operation, all to the same service instance Nginx server route.

For example, for the same commodity read request is routed to all of the same machine. Can do their own route according to a hash of request parameters, you can also use the routing function hash Nginx, etc. between services.

4, hot goods routing problems, resulting in the inclined request

In case of a commodity read and write requests are particularly high, all hit the same queue inside the same machine to go, the pressure may cause a machine is too large.

Because only when updated product data will empty the cache, and then will lead to concurrent read and write, so to see a system based on business, if the update frequency is not too high, the impact of this problem is not particularly large, but it is possible that some machine load will be higher.

Guess you like

Origin blog.csdn.net/suifeng629/article/details/93903185