[High Concurrency] How to build application-level cache in high concurrency environment?

Write in front

As our system load becomes higher and higher, the performance of the system will decline, at this time, we can naturally think of using cache to solve the problem of low data read and write performance. However, if you are determined to become a senior architect, can you build an application-level cache reasonably and efficiently in a high-concurrency environment?

Cache hit ratio

The cache hit rate is the ratio of the number of data reads from the cache to the total number of reads. The higher the hit rate, the better. Cache hit rate = number of reads from the cache / (total number of reads (number of reads from the cache + number of reads from slow devices)). This is a very important monitoring indicator. If you do caching, you should monitor this indicator to see if the cache is working well.

Cache recycling strategy

1. Space-based

Space-based means that the cache is set up with storage space, such as 10MB, when the storage space limit is reached, the data is removed according to a certain strategy.

2. Based on capacity

Based on capacity, the maximum size of the cache is set. When the cache entry exceeds the maximum size, the old data is removed according to a certain strategy.

3. Time-based

TTL (Time To Live): Survival period, that is, a period of time from when the cached data is created until it expires (regardless of whether it is accessed within this time period, the cached data will expire).
TTI (Time To Idle): Idle period, that is, the time when the cached data has not been accessed before the cache is removed.

4. Based on object references

Soft reference: If an object is a soft reference, when the JVM heap memory is insufficient, the garbage collector can reclaim these objects. Soft references are suitable for caching, so that when the JVM heap memory is insufficient, these objects can be reclaimed to make some space for strong reference objects, thereby avoiding OOM.
Weak reference: When the garbage collector reclaims memory, if a weak reference is found, it will be reclaimed immediately. Compared with soft references, weak references have a shorter life cycle.

Note: Only when no other strong reference object refers to the weak reference / soft reference object, the reference will be collected during garbage collection. That is, if an object (not a weak reference / soft reference object) references a weak reference / soft reference object, the weak reference / soft reference object will not be collected during garbage collection.

5. Recycling algorithm

The use of space-based and capacity-based caches uses certain strategies to remove old data. Common ones are as follows.

  • FIFO (First In First Out): First-in first-out algorithm, that is, the first put into the cache is removed first.
  • LRU (Least Recently Used): The least recently used algorithm, the one with the longest time and time is removed.
  • LFU (Least Frequently Used): The least commonly used algorithm, the one with the least number of uses (frequency) within a certain period of time is removed.

In practical applications, most LRU-based caches are used.

Cache type

Heap memory: Use Java heap memory to store objects. The advantage of using heap cache is that there is no serialization / deserialization and it is the fastest cache. The disadvantage is also obvious. When the amount of cached data is large, the GC (garbage collection) pause time will become longer, and the storage capacity is limited by the size of the heap space. Cache objects are generally stored via soft references / weak references. That is, when the heap memory is insufficient, this part of memory can be forcibly reclaimed to free up heap memory space. Heap caches are generally used to store hot data. Can use Guava Cache, Ehcache 3.x, MapDB to achieve.

Off-heap memory: That is, cached data is stored in off-heap memory, which can reduce the GC pause time (heap objects are transferred out of the heap, GC scanning and moving objects become fewer), can support more cache space (only affected by the size of the machine memory Limit, not affected by heap space). However, serialization / deserialization is required when reading data. Therefore, it will be much slower than the heap cache. Can use Ehcache 3.x, MapDB to achieve.

Disk cache: That is, the cached data is stored on the disk, and the data still exists when the JVM restarts, and the heap / off-heap cache data will be lost and need to be reloaded. Can use Ehcache 3.x, MapDB.

Distributed cache: Distributed cache can use ehcache-clustered (with Terracotta server) to achieve distributed cache between Java processes. You can also use Memcached and Redis.

Cache mode

Stand-alone mode: Store the hottest data to the heap cache, relatively hot data to the off-heap cache, and non-hot data to the disk cache.
Cluster mode: Store the hottest data to the heap cache, relatively hot data to the external cache, and full amount of data to the distributed cache.

Write at the end

If you find the article helpful to you, please search and follow the WeChat account of " Binghe Technology " on WeChat, and learn high-concurrency programming techniques with Binghe.

Finally, attach the core skills knowledge map that concurrent programming needs to master. I wish you all to avoid detours when learning concurrent programming.
Insert picture description here

1356 original articles were published · 2387 praised · 5.44 million views

Guess you like

Origin blog.csdn.net/l1028386804/article/details/105546816