Java interview must-test points--Lecture 08: Cornerstone of high-concurrency architecture-caching

This class introduces cache-related knowledge points and the two most commonly used caches, Memcache and Redis. Focus on learning the following three aspects:

  1. Typical problems encountered when using cache;

  2. Memcache memory structure;

  3. Redis-related knowledge points and the implementation of common Redis structures.

Caching knowledge points
type

Caching is an effective means to improve hotspot data access performance in high-concurrency scenarios, and is often used when developing projects. The types of cache are divided into: local cache, distributed cache and multi-level cache.

Local caching is caching in the memory of the process. For example, in our JVM heap, it can be implemented using LRUMap or a tool like Ehcache. The local cache is memory access, has no remote interaction overhead, and has the best performance. However, it is limited by the capacity of a single machine. Generally, the cache is small and cannot be expanded.

Distributed cache can solve this problem very well. Distributed caches generally have good horizontal scalability and can handle scenarios with large amounts of data. The disadvantage is that remote requests are required, and the performance is not as good as local caching.

In order to balance this situation, multi-level cache is generally used in actual business. The local cache only saves some hotspot data with the highest access frequency, and other hotspot data is placed in the distributed cache.

elimination strategy

Whether it is a local cache or a distributed cache, in order to ensure high performance, memory is used to save data. Due to cost and memory limitations, when the stored data exceeds the cache capacity, the cached data needs to be evicted. Common elimination strategies include FIFO to eliminate the oldest data, LRU to eliminate the least recently used data, and LFU to eliminate the least recently used data.

Memcache

Note that Memcache will be referred to as MC later.

Let’s first take a look at the characteristics of MC:

  • MC uses multi-threaded asynchronous IO when processing requests, which can reasonably take advantage of the multi-core CPU and achieve excellent performance;

  • MC has simple functions, uses memory to store data, only supports KV structure, and does not provide persistence and master-slave synchronization functions;

  • The memory structure of MC and the issue of calcification will be introduced in detail later;

  • MC can set an expiration date for cached data, and the expired data will be cleared;

  • The invalidation strategy adopts delayed invalidation, which means checking whether it is invalid when the data is used again;

  • When the capacity is full, the data in the cache will be evicted. In addition to cleaning up expired keys, the data will also be evicted according to the LRU policy.

Additionally, there are some limitations to using MC:

  • key cannot exceed 250 bytes;

  • value cannot exceed 1M bytes;

  • The maximum expiration time of a key is 30 days.

Redis

Let me briefly talk about the characteristics of Redis first to facilitate comparison with MC.

  • Unlike MC, Redis uses single-threaded mode to process requests. There are two reasons for this: one is because it uses a non-blocking asynchronous event processing mechanism; the other is that the cached data is all memory operations and the IO time will not be too long, and a single thread can avoid the cost of thread context switching.

  • Redis supports persistence, so Redis can be used not only as a cache, but also as a NoSQL database.

  • Compared with MC, Redis has a very big advantage, that is, in addition to KV, it also supports multiple data formats, such as list, set, sorted set, hash, etc.

  • Redis provides a master-slave synchronization mechanism and Cluster cluster deployment capabilities, which can provide high-availability services.

Detailed explanation of Memcache (MC)
memory structure

First, let’s look at the memory structure of MC. MC manages memory through Slab Allocator by default, as shown in the figure below. The slab mechanism is mainly used to solve the problem of memory fragmentation caused by frequent malloc/free.

As shown on the left side of the figure, MC will divide the memory into many different types of Slabs, and each type of Slab is used to store objects of different sizes. Each Slab is composed of several Pages, such as the light green modules in the picture. The default size of Pages of different Slabs is the same, 1M. This is why the default MC storage object cannot exceed 1M. Each Page is divided into many Chunks. Chunk is the space actually used to save objects, which is the orange one in the picture. The sizes of Chunks in different types of slabs are different. When saving an object, MC will select the most appropriate Chunk to store based on the size of the object to reduce space waste.

The Slab Allocator has three parameters when creating a Slab, namely the growth factor of the Chunk size, the initial value of the Chunk size and the Page size. Slabs are gradually created at runtime based on the size of the objects to be saved.

Calcification problem

Let's consider a scenario where MC is used to save user information, assuming that a single object is about 300 bytes. At this time, a large number of 384-byte slabs will be generated. After running for a period of time, an attribute is added to the user information, and the size of a single object becomes 500 bytes. At this time, saving the object requires using a 768-byte Slab, and most of the capacity in MC creates a 384-byte Slab. , so the 768 has very few slabs. At this time, although the memory of 384 Slab is largely free, 768 Slab will still frequently evict the cache according to the LRU algorithm, resulting in an increase in the eviction rate of MC and a decrease in the hit rate. This is the so-called MC calcification problem.

To solve the calcification problem, you can turn on the Automove mechanism of MC and adjust the Slab every 10 seconds. You can also restart the MC cache in batches, but be careful to warm up for a certain period of time when restarting to prevent avalanche problems. In addition, when using Memcached, it is best to calculate the expected average length of data and adjust the growth factor to obtain the most appropriate settings and avoid a large waste of memory.

Detailed explanation of Redis

The knowledge point structure of Redis is shown in the figure below.

Function

Let’s look at the functions provided by Redis.

Bitmap bitmap supports storing information by bits and can be used to implement BloomFilter; HyperLogLog provides imprecise deduplication counting function and is more suitable for deduplication statistics of large-scale data, such as statistical UV; Geospatial can be used to save Geographical location, and calculate location distance or calculate location based on radius, etc. These three can actually be counted as a data structure.

The pub/sub function is a subscribe-to-publish function that can be used as a simple message queue.

Pipeline can execute a set of instructions in batches and return all results at once, which can reduce frequent request responses.

Redis supports submitting Lua scripts to perform a series of functions.

The last function is transactions, but Redis does not provide strict transactions. Redis only guarantees serial execution of commands, and can guarantee all executions. However, when the command execution fails, it will not roll back, but will continue to execute.

Endurance

Redis provides two persistence methods: RDB and AOF. RDB writes the data set in the memory to the disk in the form of a snapshot. The actual operation is executed through the fork child process and uses binary compression storage; AOF records Redis in the form of a text log. Every write or delete operation is processed.

RDB saves the entire Redis data in a single file, which is more suitable for disaster recovery. However, the disadvantage is that if the computer goes down before the snapshot is saved, the data during this period will be lost. In addition, saving the snapshot may cause the service to be unavailable for a short period of time. use.

AOF uses the append mode for log file writing operations. It has a flexible synchronization strategy and supports synchronization per second, synchronization per modification, and non-synchronization. The disadvantage is that for data sets of the same size, AOF is larger than RDB, and AOF is less efficient in operation. Often slower than RDB.

High availability

Let’s look at the high availability of Redis. Redis supports master-slave synchronization, provides Cluster cluster deployment mode, and monitors the status of the Redis master server through Sentinels. When the master fails, a new master is selected from the slave node according to a certain strategy, and other slaves are adjusted to the new master.

Simply speaking, there are three strategies for choosing the leader:

  • The lower the priority of the slave is set, the higher the priority;

  • Under the same circumstances, the more data the slave copies, the higher the priority;

  • Under the same conditions, the smaller the runid, the easier it is to be selected.

In the Redis cluster, sentinel will also be deployed in multiple instances, and the sentinels use the Raft protocol to ensure their high availability.

Redis Cluster uses a sharding mechanism, which is divided into 16384 slots internally, which are distributed on all master nodes. Each master node is responsible for a part of the slots. During data operation, CRC16 is performed according to the key to calculate which slot it is in and which master will process it. Data redundancy is guaranteed through slave nodes.

key failure mechanism

The expiration time of Redis key can be set. After expiration, Redis adopts a combination of active and passive failure mechanisms. One is to trigger passive deletion during access like MC, and the other is regular active deletion.

elimination strategy

Redis provides 6 elimination strategies. One is to perform LRU, minimum survival time and random elimination only for keys with an expiration date; the other is to perform LRU and random elimination for all keys. Of course, you can also set it not to eliminate it. If the object is stored when the capacity is full, an exception will be returned, but the existing key can still be read.

new features

You can learn about the new features of Redis 4.0 and 5.0. For example, 5.0's Stream is a message queue that can support multicast, that is, write once and read multiple times. You can also learn about the module mechanism of 4.0, etc.

data structure

Redis uses dictionaries internally to store different types of data, such as dicttht in the figure below. The dictionary consists of a set of dictEntry, which includes pointers to key and value and a pointer to the next dictEntry.

In Redis, all objects are encapsulated into redisObject, as shown in the light green module in the picture. redisObject includes the types of objects, which are the five types supported by Redis: string, hash, list, set and sorted set. In addition, redisObject also includes the storage method of specific objects, such as the several types in the module marked by the dotted line on the far right of the figure.

The specific data storage methods are introduced below in combination with types.

  • The string type is the most commonly used type in Redis, and the internal implementation is stored through SDS (Simple Dynamic String). SDS is similar to ArrayList in Java and can reduce frequent allocation of memory by pre-allocating redundant space.

  • The list type is implemented by ziplist compressed list and linkedlist doubly linked list. ziplist is stored in a continuous piece of memory and has high storage efficiency, but it is not conducive to modification operations and is suitable for situations with less data; linkedlist has very low complexity in inserting nodes, but its memory overhead is large, and each node The addresses are not continuous and are prone to memory fragmentation. In addition, quicklist was added after version 3.2, which combines the advantages of both. Quicklist itself is a two-way acyclic linked list, and each of its nodes is a ziplist.

  • The hash type has two implementations in Redis: ziplist and hashtable. When the length of all key and value strings in the Hash table is less than 64 bytes and the number of key-value pairs is less than 512, the compression table is used to save space; when it exceeds, the hashtable is used instead.

  • The internal implementation of the set type can be intset or hashtable. When the elements in the set are less than 512 and all data are numeric types, intset will be used, otherwise hashtable will be used.

  • Sorted set is an ordered set, and the implementation of the ordered set can be ziplist or skiplist. The encoding conversion conditions of ordered sets are somewhat different from hash and list. When the number of elements in the ordered set is less than 128 and the length of all elements is less than 64 bytes, ziplist will be used, otherwise it will be converted to skiplist.

Tip: Redis memory allocation is done using jemalloc. jemalloc divides the memory space into three ranges: small, large, and huge, and divides small memory blocks in the range. When storing data, it selects the memory block with the most appropriate size for allocation, which is beneficial to reducing memory fragmentation.

Caching FAQ

We have compiled a table for several problems we often encounter when using cache, as shown in the figure below.

Cache update method

The first issue is the cache update method, which is an issue that should be considered when deciding to use cache.

The cached data needs to be updated when the data source changes. The data source may be a DB or a remote service. The update method can be active update. When the data source is DB, the cache can be updated directly after updating the DB.

When the data source is not a DB but other remote services, it may not be able to proactively sense data changes in a timely manner. In this case, you generally choose to set an expiration date for the cached data, which is the maximum tolerance time for data inconsistency.

In this scenario, you can choose invalidation update. When the key does not exist or is invalid, first request the data source to obtain the latest data, then cache it again, and update the expiration date.

But there is a problem with this. If the dependent remote service encounters an exception during update, the data will be unavailable. The improved method is asynchronous update, which means that the data is not cleared when it expires, but the old data is continued to be used, and then the asynchronous thread performs the update task. This avoids the window period at the moment of failure. There is also a purely asynchronous update method that updates data in batches at regular intervals. In actual use, you can choose the update method according to the business scenario.

Data is inconsistent

The second problem is the problem of data inconsistency. It can be said that as long as you use cache, you must consider how to face this problem. Cache inconsistencies are generally caused by active update failure. For example, after updating the DB, the Redis update request times out due to network reasons; or asynchronous update fails.

The solution is that if the service is not particularly sensitive to time-consuming, you can increase retries; if the service is sensitive to time-consuming, you can handle failed updates through asynchronous compensation tasks, or short-term data inconsistencies will not affect the business, then as long as the next update It can be successful as long as the final consistency can be guaranteed.

cache penetration

The third problem is cache penetration. The cause of this problem may be external malicious attacks. For example, the user information is cached, but the malicious attacker frequently requests the interface using a non-existent user ID, causing the query cache to miss, and then the query through the DB still misses. At this time, there will be a large number of requests penetrating the cache to access the DB.

The solution is as follows.

  1. For non-existing users, save an empty object in the cache to mark it to prevent the same ID from accessing the DB again. However, sometimes this method does not solve the problem well and may cause a large amount of useless data to be stored in the cache.

  2. Use BloomFilter filter. The characteristic of BloomFilter is existence detection. If it does not exist in BloomFilter, then the data must not exist; if it exists in BloomFilter, the actual data may not exist. Very suitable for solving this kind of problems.

Cache breakdown

The fourth problem is cache breakdown, which means that when a certain hot data fails, a large number of requests for this data will penetrate the data source.

There are following ways to solve this problem.

  1. Mutex lock updates can be used to ensure that the same process will not request concurrent requests to the DB for the same data, reducing DB pressure.

  2. Use the random backoff method, randomly sleep for a short period of time when it fails, query again, and perform updates if it fails.

  3. To solve the problem of multiple hotspot keys failing at the same time, you can use a fixed time plus a small random number when caching to avoid a large number of hotspot keys failing at the same time.

cache avalanche

The fifth problem is cache avalanche. The reason is that the cache hangs up, and all requests will penetrate to the DB.

Solution:

  1. Use a fast-fail circuit breaker strategy to reduce the instantaneous pressure on the DB;

  2. Use master-slave mode and cluster mode to ensure the high availability of the cache service.

In actual scenarios, these two methods are used in combination.

Inspection points and bonus points
Inspection point

The main interview points of this course are the understanding of cache characteristics and the mastery of the characteristics and usage of MC and Redis.

  1. You need to know the usage scenarios of cache and how different types of cache are used, such as:

    1. Cache DB hotspot data to reduce DB pressure; cache dependent services to improve concurrency performance;
    2. For pure KV caching scenarios, you can use MC, but if you need to cache special data formats such as list and set, you can use Redis;
    3. If you need to cache a list of videos recently played by a user, you can use Redis's list to save it. When you need to calculate ranking data, you can use Redis's zset structure to save it.
  2. To understand the common commands of MC and Redis, such as atomic increase and decrease, commands to operate on different data structures, etc.

  3. Understanding the storage structure of MC and Redis in memory will be helpful for evaluating usage capacity.

  4. Understand the data failure methods and deletion strategies of MC and Redis, such as actively triggered periodic deletion and passively triggered deferred deletion.

  5. It is necessary to understand the principles of Redis persistence, master-slave synchronization and Cluster deployment, such as the implementation methods and differences between RDB and AOF.

bonus

If you want to perform better in interviews, you should also know the following bonus points.

First, it is necessary to introduce the use of cache based on actual application scenarios. For example, when calling the back-end service interface to obtain information, you can use local + remote multi-level cache; for dynamic ranking scenarios, you can consider using Redis' sorted set to implement it, etc.

Second, it is best if you have experience in distributed cache design and use, such as in what scenarios have Redis been used in the project, what data structures are used, and what types of problems are solved; when using MC, adjust the McSlab allocation parameters according to the estimated size. etc.

Third, it is best to understand the problems that may arise when using cache. For example, Redis handles requests in a single thread, so time-consuming single request tasks should be avoided as much as possible to prevent mutual influence; the Redis service should avoid being deployed on the same machine as other CPU-intensive processes; or Swap memory exchange should be disabled to prevent Redis caching. Data is exchanged to the hard disk, affecting performance. Another example is the MC calcification problem mentioned earlier.

Fourth, you need to understand the typical application scenarios of Redis, for example, use Redis to implement distributed locks; use Bitmap to implement BloomFilter, use HyperLogLog to perform UV statistics, and so on.

Finally, learn about the new features in Redis 4.0 and 5.0, such as the persistent message queue Stream that supports multicast; customized function expansion through the Module system, etc.

Summary of real questions

The actual interview questions in this class are summarized below and the key points are explained.

Questions 1 to 4 have been mentioned before and will not be repeated again.

Question 5 can be answered from the aspects of master-slave read-write separation, multi-slave libraries, multi-port instances, and Cluster cluster deployment to support horizontal expansion. High availability can be answered by using Sentinel to ensure that the master is re-elected and completed when the master hangs up. Change from database.

Question 6: You can use Redis' sorted set to implement a delay queue, use timestamps as Score, and the consumer uses zrangbyscore to obtain data before the specified delay time.

  • Distributed locks can be implemented using setnx in simple scenarios. Use setnx to set the key. If 1 is returned, the setting is successful, that is, the lock is acquired successfully. If 0 is returned, the lock acquisition fails. setnx needs to use the px parameter to set the timeout at the same time to prevent deadlock from occurring after the instance that acquires the lock goes down.

  • In strict scenarios, you can consider using the RedLock solution. But the implementation is more complicated.

The next lesson will explain knowledge about queues and databases.

Guess you like

Origin blog.csdn.net/g_z_q_/article/details/129915921
Recommended