Java cache understanding

bc4596186e8c402a8161f81f9050dcfb.jpg

 

 

CPU usage: If you have certain applications that require a lot of CPU to calculate, such as regular expressions, if you use regular expressions frequently and they take up a lot of CPU, then you should use cache to store the regular expressions. The results are cached.

Database IO performance: If you find that there is a large amount of data that needs to be queried frequently, or some data does not change frequently, in order to improve the database IO performance, you can use cache

Cache definition

 

The so-called cache is to store objects that are frequently called by a program or system in memory so that they can be quickly called when used again without having to create new duplicate instances. Doing so can reduce system overhead and improve system efficiency.

Caching can be divided into two main categories:

Through file caching, as the name suggests, file caching refers to storing data on disk, whether you are in XML format, serialized file DAT format or other file formats;

Memory cache is to create a static memory area and store data in it. For example, in our B/S architecture, data is stored in Application or in a static Map.

local cache

 

The so-called local cache is relative to the network (including clusters, database access, etc.)

In the system, some data are small in size but accessed very frequently (such as national standard administrative area data or some data dictionaries, etc.). For this scenario, the data needs to be cached in the local cache of the application to improve system access. Efficiency, reducing unnecessary database access (database access occupies database connections, and network consumption is relatively large)

But one thing you need to pay attention to is the space occupied by the cache and the cache invalidation policy.

Distributed/cluster cache definition

 

An extension of the traditional stand-alone cache concept, used to represent a cache that can span multiple servers and be scalable at the same time. To put it bluntly, it is a caching method that stores local cached data on a remote clusterable cache carrier and uses the network to exchange data.

Commonly used cluster carriers: Redis; Memcached

Multi-level cache

 

Definition: The system uses a variety of caching (local caching and distributed caching) technologies to cache data. When applying data, the data is read sequentially according to priority (first local, then distributed). If there is no cache in all caches, data, read and write data to the database, and cache and update the obtained data in sequence (first distributed, then local)

Why use multi-level cache technology (take the first-level cache L1 (Caffeine) and the second-level cache L2 (Redis) as an example)

If there is only L1, then after the system is redeployed, all cached data will disappear;

If you only need L2 and use Redis, many systems will use cloud services. At this time, accessing Redis will require a certain amount of network I/O and serialization and deserialization. Although the performance is very high, it is not as fast as local memory L1.

Commonly used techniques for multi-level caching

 

Use Caffeine as the first-level cache and Redis as the second-level cache.

First, query the data in Caffeine and return it directly if there is any. If not, proceed to step 2.

Then query in Redis. If the returned data is found, fill this data in Caffeine. If not found, proceed to step 3.

Finally, query in Mysql. If the returned data is found, fill in the data in Redis and Caffeine in turn.

For Caffeine's cache, if there is data update, the cache on the machine that updated the data can only be deleted. Other machines can only expire the cache through timeout. There are two strategies for timeout setting.

Set the time after writing to expire.

Set how long it takes to refresh after writing

cache updates

 

Delete the cache first and then update the database.

There is a big problem with this operation. After the cache is deleted, there is a read request. At this time, because the cache is deleted, the library will be read directly. The data in the read operation is old and will be loaded into the cache. Subsequent reads Request all accessed old data.

Update the database first, then delete the cache (recommended)

There is a data that is not cached at this time, so the query request will be dropped directly into the database. The update operation is after the query request, but the update operation deletes the database operation after the query is completed and before the cache is backfilled, which will cause cache inconsistencies between our cache and the database. , but the probability of this problem is very small.

Why delete the cache instead of updating it

You can think about it. When there are multiple concurrent requests to update data, you cannot guarantee that the order of updating the database is consistent with the order of updating the cache. Then there will be inconsistencies between the data in the database and the cache. So generally consider deleting cache.

Cache the Three Musketeers

 

cache penetration

Explanation: Cache penetration means that the queried data does not exist in the database, so naturally it does not exist in the cache. Therefore, if it cannot be found in the cache, it will go to the database to get the query. If there are too many such requests, then our database will Stress will naturally increase

solution:

Agreement: Returns that are NULL are still cached, returns that throw exceptions are not cached, and be careful not to cache exceptions that are thrown as well. Using this method will increase the maintenance cost of our cache. We need to delete this empty cache when inserting the cache. Of course, we can solve this problem by setting a shorter timeout.

Develop some rules to filter out some data that cannot exist. Use BitMap for small data and Bloom filter for big data. For example, your order ID is obviously in the range of 1-1000. If it is not data within 1-1000, then in fact It can be filtered out directly.

Cache breakdown

Explanation: The expiration time is set for some keys, but they are hot data. If a key fails, a large number of requests may come in, the cache misses, and then the database is accessed. At this time, the number of database visits will increase sharply.

solution:

Add a distributed lock: When loading data, you can use a distributed lock to lock the Key of the data. You can directly use the setNX operation in Redis. For the thread that obtains the lock, query the database and update the cache, and other threads adopt a retry strategy. , so that the database will not be accessed by many threads at the same time.

Asynchronous loading: Since cache breakdown is a problem that only occurs with hot data, you can adopt a strategy of automatically refreshing this part of hot data when it expires, instead of automatically eliminating it when it expires. Elimination is actually for the timeliness of data, so automatic refresh can also be used.

cache avalanche

Explanation: Cache avalanche means that the cache is unavailable or a large number of caches expire in the same time period due to the same timeout. A large number of requests directly access the database, and excessive pressure on the database causes a system avalanche.

solution:

Increase the availability of the cache system, pay attention to the health of the cache by monitoring it, and expand the cache appropriately according to the business volume.

Multi-level cache is used, and different cache levels have different timeout settings. Even if a certain level of cache expires, other levels of cache will be covered.

The cache expiration time can be a random value. For example, if a timeout of 10 minutes was set in the past, each Key can expire randomly in 8-13 minutes. Try to make the expiration time of different Keys different.

Cache monitoring

 

Many people also ignore cache monitoring. Basically, if no error is reported after going online, it will take effect by default. However, there is this problem. Due to lack of experience, many people may set inappropriate expiration time, or inappropriate cache size, resulting in low cache hit rate, making the cache become an ornament in the code. Therefore, it is also important to monitor various cache indicators. Through its different indicator data, we can optimize the cache parameters to optimize the cache.

Guess you like

Origin blog.csdn.net/weixin_57763462/article/details/132768595