Some simple understanding of cache

1. Cache

Lift the cache (Cache) , generally thought of cpu cache, memory cache. 缓存的本质是将部分的数据使用另一种存取速度更快的介质存储,使系统更快的操作和响应. For example, we will be part of the data from disk into memory, direct memory data operations, so faster than reading data from the disk on hundreds and hundreds of times.

Since the high-speed media so easy to use, so why not do all the high-speed media, here we usually use memory and disk to make a direct comparison.

The first is the huge difference in price, the current price is the market price of the disk is probably in a few cents to a few dollars 1GB, while dozens of blocks of memory you need, or even close to one hundred 1GB, a far cry from the difference in price. All use memory to store data, this pocket money can not stand ah. If cpu cache with disk than the cost, it is simply a day at a time, and the CPU cache is by MB unit to count. Intel i9-9900K processor, cache 16MB.

Another very important reason is to locate memory and disk is not the same, the system memory in order to be able to quickly process data and set up, it is immediate, not persistent. After power-up memory will work, once the shutdown or power outage, the data is lost. This data can not be saved permanently, this is not is not an option ah. The disk is not the same, although it is relatively slow, but the data is persisted.

Because the location and characteristics of memory and disk, each of which has led to their appropriate usage scenarios, in general, are commonly used.

2. Using caching problem

Here, they talk about things related to memory and database cache data refers to a copy of the data stored in the memory.

Usually we write the server, because of limitations of memory size, cache stores frequently accessed data from the part database into memory in order to speed up the processing speed of the server, increasing the throughput of the service. But the use of the cache is not so simple, it contains a lot of the pit, waiting for you to step on.

1. Data consistency

The biggest problem is the use of cache data inconsistencies , in a complex environment, the more obvious, more difficult to handle.

Data in the cache is a copy of the database, if it is read-only operations, then the problem does not occur if it comes to modify or delete operation, will issue inconsistent data.

For example, a user name is NameA, NameB time he changed the name, then only the operation or the database cache, the cache would lead to inconsistent data and database problems.

Then someone said, and at the same time to operate the database and the cache is not on it. The idea is possible, but to achieve it, but the conditions under concurrent problems arise.

Suppose there are two threads t1 and t2, t1 is a modify operation thread, the thread t2 is a read operation, to modify the database thread t1, not enough time to modify the cache, then the cache data read thread t2, then t2 read data it is not up to date data.

Some people say that this time, add a lock is not enough, you can use read-write locks and lock granularity is small, user-id as a lock. All threads can be read in the read operation, the write operation, only one thread can operate, it will ensure data consistency, but also to meet the requirements of high-performance servers.

This is to ensure cache coherency of a solution, but not to say that this approach is perfect, not necessarily suitable for other application scenarios.

When using the cache, there is no perfect solution for the most common program of the scene only the most suitable.

2. hits

After all, only part of the cache copy of the data, when the data is read, it reads the data in the cache, the cache if not, it will go to read data from the database. If the request requires the data in the cache are not, at this time there is no cache little effect. In peacetime, when the use of the cache, which is commonly used to evaluate the data.

When improve the hit rate, there is a more general approach. When the data is not in cache, then thread reads the database, after reading the data stored in the cache, the next time reading, reading the cache on the line.

3. Space finishing

Memory space is relatively small, the cached data used in the process will be more and more, this time we should clean up the data, and will not be used, delete unnecessary data. Then find out it is not frequently used data in what ways, listed here are several commonly used cache policy.

FIFO ( first in first out)

先进先出Strategy is very simple and crude, when cleared, the data stored in the beginning, will first be cleared. Which in certain cases it is possible, but generally very rigid using this strategy will become the program, not "smart."

LFU(less frequently used)

最少使用Strategy. At least some of the data refers to the frequency of use, the first to be cleared. Common practice is to hit the number of marker data, in order to determine whether the data more commonly used.

In general the number of hits have to add a time-limited period, the number of hits recorded data of a certain period, to prevent some of the data is commonly used in the beginning, and later will not be used to.

For example, a player a certain ship became addicted mother, after all, so much, oh no, is so fun, the players log on every day, he personal information a high number of hits for granted. But one day he does not move too krypton liver does not move, abandoned pit, after never logged in before, then the data is actually not a commonly used.

LRU (less recently used)

最近最少使用Strategy, time stamp data last used when removing the release of the data is not used for too long. This applies hot data in the scene.

Set an expiration time

Added caching, the data give the mark survival time, more than this time, the data will be released. Redis in this very frequently used.

Random cleanup

As its name suggests, clear when random cleanup. What a rough cleaning method, there may be a program will use it (I'm not sure). My little experience, the use of scenarios have not seen this method, if there is trouble let me know.

4. Cache avalanche

Cache avalanche refers to a large number of requests directly through the caching layer request database ( 缓存穿透), resulting in excessive pressure on the database, there will be severe database downtime.

Cache penetration occurs may be due to the original cache expires (or for other reasons is cleared), a new cache is not stored, leading to penetrate the request to the database. It is also possible that the server has just started, had a chance to cache data.

The solution is to avoid a large cache expires at the same time, and when the server starts, the volume of requests from small to large boot to the server.

3. Summary

Why use caching

Because of its fast speed, and nothing more

Use caching problem

Here is not summed up, see the article it

In fact, I want to write about specific business scenarios cache design and considerations in the design and choice, but to write very long, or put another article better.

Reference article

Guess you like

Origin juejin.im/post/5da577cce51d4524f007f363