【Cautions in using cache】

1. Cache penetration

Cache penetration refers to querying a data that must not exist. Since the cache is passively written when it misses, and for fault tolerance reasons, if no data can be found from the storage layer, it will not be written to the cache, which will lead to this non-existent data. Every time the data is requested, it has to go to the storage layer to query, which loses the meaning of caching.

The hacker queries the target system for a piece of data that must not exist. It may be that the data really does not exist, or it may be that a third party maliciously attacks the system and deliberately builds a large number of non-existing ids to attack the database. If a very large number of cache penetrations occur in a short period of time, the database of the system will face great pressure or even downtime.

There are many ways to effectively solve the problem of cache penetration. The most common one is to use a Bloom filter to hash all possible data into a large enough bitmap, and a data that must not exist will be used by this bitmap. Intercepted, thus avoiding the query pressure on the underlying storage system. In the data cube, we adopt a simpler and rude method. If the data returned by a query is empty (whether the data does not exist or the system fails), we still cache the empty result, but its expiration time It will be short, no longer than five minutes.

solution:

(1) Store empty values: For non-existent values, a special value is also stored in the cache to represent Empty, and a relatively short cache expiration time is set for this id, so that the empty cache can still be used in the database after invalidation. Check whether the corresponding value of this id already exists.

2. Cache Concurrency

Sometimes if the concurrent access of the website is high, if a cache fails, multiple processes may query the DB at the same time and set the cache at the same time. If the concurrency is really large, this may also cause excessive pressure on the DB and frequent cache updates. .

My current idea is to lock the cache query in the APP. If the KEY does not exist, lock it, then check the DB into the cache, and then unlock it; if other processes find a lock, wait, and then return the data or enter the DB after unlocking Inquire.

solution:

When a request finds that there is no data in the cache, it needs to acquire an exclusive lock at this time, thereby requesting to query the database and update the cache. Release this lock after this step is complete. In the process of this request operation, all other requests requesting the same id are blocked, and the data is directly refetched from the cache after the update request is completed.

3. Cache invalidation

The main reason for this problem is when there is high concurrency. Usually, when we set the expiration time of a cache, some may be set to 5 minutes or 10 minutes; when the concurrency is high, it may be generated at the same time at a certain time. There are many caches with the same expiration time. At this time, when the expiration time expires, these caches will be invalid at the same time, and all requests will be forwarded to the DB, and the DB may be under too much pressure. When the system starts to load the data cache in batches, the expiration time of the cache is set too consistent; or a large number of caches are updated or loaded during the peak period of system access, and the expiration time is consistent. There is a problem with the cache system and it restarts or crashes.

One of the simple solutions is to disperse the cache expiration time. For example, we can add a random value to the original expiration time, such as 1-5 minutes random, so that the repetition rate of the expiration time of each cache will be reduced. It is difficult to cause a collective failure event.

solution:

(1) The cache expiration time should be spread out as much as possible . You can set a range, such as 1-200 seconds, and hash the cache expiration time in this range to prevent a large number of cache expirations at the same time.

(2) Cache double write strategy , write the same cache to two cache systems at the same time. Memcache has a parameter -b that can be used to set the backup cache. When the main cache fails, the data can also be retrieved through the backup.

(3) The cache never expires , the expiration time is not set in the cache system, and all cache data updates are done through business logic programs.

Summarize:

1. Cache penetration: Query a data that must not exist. For example, in the article table, querying a non-existing id will access the DB every time. If someone maliciously destroys it, it may directly affect the DB.

2. Cache invalidation: If the cache is invalidated within a period of time, the pressure on the DB is prominent. There is no perfect solution to this, but it is possible to analyze user behavior and try to distribute the failure time points evenly.

Cache avalanches occur when a large number of cache penetrations occur, such as large concurrent accesses to an invalid cache.

Fourth, multi-level cache

Use ehcache and redis as secondary cache

The second-level cache in Hibernate, the second-level cache is a caching mechanism at the SessionFactory level. The first-level cache is the session-level cache, which is a transaction-scoped cache, managed by Hibernate, and generally does not require intervention. The second-level cache is the SessionFactory-level cache, which is a process-wide cache.

The first-level cache of MyBatis refers to the query executed when the session is closed in a session domain, which will be cached according to the SQL key (same as the mysql cache, modifying the value of any parameter will cause the cache to be invalidated)

The second-level cache of mybatis, the second-level cache is a mapper-level cache, and multiple SqlSessions operate the SQL statements of the same Mapper. Multiple SqlSessions can share the second-level cache, and the second-level cache is across SqlSessions.

The second-level cache of mybaits is at the mapper range level. In addition to setting the general switch of the second-level cache in SqlMapConfig.xml, the second-level cache must be enabled in the specific mapper.xml.

Add <setting name="cacheEnabled" value="true"/> to the core configuration file SqlMapConfig.xml

The first level cache is the SqlSession level cache. When operating the database, the sqlSession object needs to be constructed, and there is a (memory area) data structure (HashMap) in the object for storing cached data. The cached data areas (HashMap) between different sqlSessions do not affect each other.

The scope of the first-level cache is the same SqlSession, and the same sql statement is executed twice in the same sqlSession. After the first execution, the data queried in the database will be written to the cache (memory), and the second time will get the data from the cache. Will no longer query from the database, thus improving query efficiency. When a sqlSession ends, the first-level cache in the sqlSession does not exist. Mybatis turns on the first level cache by default.

The second-level cache is a mapper-level cache. Multiple SqlSessions operate the sql statement of the same Mapper, and multiple SqlSessions operate the database to obtain data. There will be a second-level cache area. Multiple SqlSessions can share the second-level cache. SqlSession.

The second-level cache is shared by multiple SqlSessions, and its scope is the same namespace of mapper. Different sqlSessions execute the sql statement under the same namespace twice and pass the same parameters to the sql, that is, the same sql statement is finally executed. After the first execution, the data queried in the database will be written to the cache (memory), and the data will be obtained from the cache for the second time, and will no longer be queried from the database, thereby improving query efficiency. Mybatis does not open the second-level cache by default. You need to configure the second-level cache in the setting global parameter.

Suitable for placing data in the second level cache

rarely modified

Not very important data, allowing occasional concurrency issues

Data that does not fit into the L2 cache

frequently revised

Financial data, no concurrency issues are allowed

Data shared with other app data

【Cautions in using cache】

Guess you like