5 minutes to learn quickly, caching proper use!

Cache Operations

Read cache

Read cache can be divided into two cases hit (cache hit) and miss (cache miss):

Cache Hit
  • First, get the data from the cache
  • The data in the cache is returned
Cache misses
  • First, get the data from the cache
  • At this time a cache miss, get data from the database
  • Writing data to the cache
  • Return data

Read caching process by caching data in there? Decision, if there is data in the cache that is a cache hit , if not it is a cache miss :

Write cache

Write cache can be divided 更新缓存and 删除缓存.

refresh cache

The need to update the cache when two conditions:

  • Updating simple data types (e.g., string)
  • Update complex data types (e.g., hash)

For 简单数据类型directly update the cache, if the data type is update adds additional overhead:

  1. Obtaining data from the cache
  2. The sequence data objects into column trans
  3. Update object data
  4. The sequence of the updated data into the cache

Update complex data cache requires a minimum of four steps, and the need to update the cache every time you write data, so that the less read caching scenarios, the data may be updated 7-8 times a read cache only occur once think it does not pay, in addition to per It must calculate when to cache data update cache, the cache data is calculated when it is clear that the write cache data before updating is not necessary, can be cached update, postpone read cache (cache miss) to .

Delete Cache

Delete the cache also called out of the cache , delete cache operation is very simple, direct cache delete cache library from it.

Cache operation sequence

Cache is generally used in conjunction with the database, retrieve data from a database and then update the cache. Why should we discuss the operation of the order of the cache it? Because different order of operations will produce different results in some cases, common sequence of operations can be divided into:

  • The first database, and then cache
  • First cache, then the database

Whatever the order to go through the database, caching two-step operation, the two operations is not an atomic operation data inconsistencies occur in some cases. Here will be described a different order carried inconsistent data, concurrency problems, respectively.

After the first database cache

As Figure writes data to the database, and then go to update or delete the cache. Two steps 1 and 2 are likely to fail, if failure is the first step may be abnormal, business caller catch exceptions thrown by the business information processing, because this time is not to be understood as a write cache operation database failed.

If the first step is successful (success write database), and then again when the cache operation fails, there are two cases:

  1. Database Rollback: If the business is strong need to ensure consistency with the database cache, you can throw a business exception to the caller.
  2. Not to deal with: the 数据库回滚contrary, the business can accept before the cache expiration time to reach, caching and database allows data inconsistencies.

For example, suppose a cache data string data type, it is key nameand is now in the database and the values are cached arch-digest.

String name = "arch-digest";
复制代码

To present namevalue updated to juejin, in accordance with the first database cache order:

//将name的值更新为juejin
public void update(String name){
    db.insert(...);  //更新数据库
    cache.delete(name); //更新缓存
}
复制代码

Under normal circumstances db.insert(...)and cache.delete(name)have successfully executed no objection. If it is some other reason cache.delete(name)fails, the value that the database is updated value juejin, and the data in the cache or arch-digestso next time get a read cache value is arch-digest.

public String getNameFromCache(String name){
    String value =  cache.get(name); //从缓存中获取数据
    ...
    return value;
}
复制代码

When the read cache getNameFromCachemethod, if namethe cache has not expired and that will always get arch-digest, so the situation will result in inconsistent user to see the data.

After the first cache database

After the first cache database and before it comes in after the first database cache is almost in addition may lead to inconsistent data, it will also have concurrency issues.

As the above data is now updated, if it is 更新数据库a failure when what happens? Here are two cases according to the operation of the cache:

  1. Update cache: update the cache data, will get new data (data inconsistency) cache when the cache to the latest data, the database is old data, the next read.
  2. Delete Cache: Delete the data in the cache, the next fetched from the database when you read (consistent data).

更新缓存And 删除缓存operation of the above has been introduced, not much to do to explain. Obviously on 更新缓存and 删除缓存in this case the first 删除缓存is more appropriate, not inconsistent data problems, but in the use of 删除缓存time should also pay attention cause concurrency issues:

  1. A successfully remove cached thread
  2. Thread B reads a cache miss
  3. Thread B get the data from the database
  4. Thread B will write cache data in the database
  5. A thread successfully written to the database

In high concurrency scenarios, inconsistent data cache and database cases still occur. To solve that database and cache data consistency What solution?

Data consistency optimization

Here that is 优化方案not a solution Oh, because transactions in a distributed environment is a problem, now there is no good solution. Only to find the most suitable business optimization program, the possibility of inconsistent data within a business or delay down to an acceptable range.

Several common optimization scheme may include:

  1. Does not deal
  2. Dual Delay deleted
  3. Subscribe Binglog

Three schemes from simple to complex, can select the most appropriate optimization scheme according to the service.

Does not deal

Does not deal with is the easiest way, the database that is inconsistent with the data in the cache when not treated in the case of traffic allowed. Although a bit inappropriate, but very fragrant!

Dual Delay deleted

Delay can be used to optimize the dual deleted after the first cache database concurrency problems:

  1. A successfully remove cached thread
  2. Thread B reads a cache miss
  3. Thread B get the data from the database
  4. Thread B will write cache data in the database
  5. A thread successfully written to the database
  6. A thread to sleep one second and then delete the cache

This program adds Step 6, write the database after the completion of the writing thread sleep one second, and then delete the cached data out, so that other threads lead to a cache miss to get data from the database and update the cache when data is read again.

This one second how determined the specific sleep how long?

For the above situation should be evaluated on their own time-consuming to read data business logic of their own projects. Then write data at sleep time consuming basis of the read data on the business logic, you can add a few hundred ms. The purpose of doing so is to ensure that the end of the read requests, write requests can delete cached read requests caused by dirty data.

With this synchronous phase out strategy to reduce throughput how to do?

Second removed as asynchronous. Yourself a thread, asynchronous delete. Thus, the written request would not sleep after a period of time, and then return. To do so, to increase throughput.

Subscribe Binlog

Use Binlog subscription, so once in MySQL creates a new write, update, delete, etc., you can put binlog related news pushed to Redis, Redis and then record according to the binlog on Redis updated.

In fact, this mechanism is very similar to the master-slave MySQL backup mechanism, because the MySQL data consistency standby is achieved by binlog.

Here can combine canal (Ali, an open source framework), you can subscribe to the MySQL binlog through the frame, while the canal is the imitation of a backup request slave mysql database, so the data update Redis reached the same effect.

Of course, here the message push tool you can use other third party: kafka, rabbitMQ push updates, etc. to implement caching.

A heavy field of architecture every day a good text, Internet companies involved in front-line application architecture (high availability, high performance, high stability), big data, machine learning, various popular areas of Java architecture.

Guess you like

Origin juejin.im/post/5de8ec366fb9a0165721c742