Summary of Redis high-frequency interview questions (Part 2)

Table of contents

1. What is Big Key (big key) in Redis

2. What problems will Big Key cause?

3. How to find bigkey?

4. Why the keys * command should be used with caution in the redis production environment

5. How to deal with the centralized expiration of a large number of keys

6. Use batch operations to reduce network transfers

7. Cache penetration

8. Cache breakdown

9. Cache Avalanche

10. Cache pollution (or full)

11. Redis supports a total of eight elimination strategies

12. Database and cache consistency


1. What is Big Key (big key) in Redis

Big Key in Redis refers to key-value pairs that take up a lot of memory. In Redis, memory is a precious resource. If some key-value pairs occupy too much memory, it will cause the performance of the Redis server to degrade, and may even cause memory overflow.

Specifically, a Big Key in Redis usually refers to a key-value pair that occupies more than a certain threshold (for example, 10KB). These key-value pairs may include large amounts of text or binary data, or include a large number of hash fields or collection elements, etc. For these Big Keys, special processing and optimization are required to reduce memory usage and improve the performance and reliability of the Redis server.

The reason for the emergence of Redis Big Key may be that a large amount of text or binary data is stored, or a large number of hash fields or collection elements are stored.

2. What problems will Big Key cause?

Big Key (big key) is a key-value pair that occupies a large amount of memory in Redis. If there are too many Big Keys, it will have a negative impact on the performance and reliability of the Redis server, including the following hazards:

  1. Memory occupation: Big Key occupies a large amount of memory. If the memory of the Redis server is insufficient, the performance of the Redis server will decrease, and it may even cause the Redis server to crash.

  2. Operation delay: When reading and writing Big Key, it will occupy a lot of CPU resources and IO resources, resulting in increased operation delay. If there are multiple clients operating the Big Key at the same time, there will be a lot of waiting, which will lead to a decrease in the performance of the Redis server.

  3. Persistence delay: If the Big Key needs to be persisted, such as writing to an RDB file or an AOF file, a large amount of CPU resources and IO resources will be occupied, resulting in an increase in the persistence delay. If the load of the Redis server is too high, it may cause persistence failure or high latency, thus affecting the reliability and consistency of data.

  4. Data loss: If the memory of the Redis server is insufficient, memory overflow may occur, resulting in loss of Big Key data. If the Big Key contains important data, it may cause data loss or data inconsistency.

To sum up, the existence of Big Key will have a negative impact on the performance and reliability of the Redis server, which may lead to problems such as memory usage, operation delay, persistence delay, and data loss. Therefore, special handling and optimization of Big Key is required to reduce memory usage and improve the performance and reliability of the Redis server.

3. How to find bigkey?

--bigkeys1. Use the parameters that come with Redis to quickly find the Big Key. The specific steps are as follows:

  1. When starting the Redis client, add --bigkeysparameters, for example:
    redis-cli --bigkeys
    
  2. After the Redis client connects to the Redis server, execute commands, for example:
    bigkeys
  3. The Redis server will scan all the key values, find out the key values ​​that occupy a large amount of memory, and return their information, for example:
    bigkeys
    # Scanning the entire keyspace to find biggest keys as well as
    # average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
    # per 100 SCAN commands (not usually needed).
    [00.00%] Biggest string found so far 'my_big_string' with 100 bytes
    [00.00%] Biggest list found so far 'my_big_list' with 10 elements
    [00.00%] Biggest set found so far 'my_big_set' with 5 members
    [99.99%] Biggest zset found so far 'my_big_zset' with 1000 members
    [99.99%] Biggest hash found so far 'my_big_hash' with 10 fields
    -------- summary --------
    Sampled 10000 keys in the keyspace!
    Total key length in bytes is 51451 (avg len 5.15)
    Biggest string found 'my_big_string' has 100 bytes
    Biggest list found 'my_big_list' has 10 items
    Biggest set found 'my_big_set' has 5 members
    Biggest zset found 'my_big_zset' has 1000 members
    Biggest hash found 'my_big_hash' has 10 fields

    Through the above method, all Big Keys can be quickly found out, and their size and type can be understood, so as to perform further optimization and processing. It should be noted that the use of --bigkeysparameters will scan the entire Redis database, which may have a certain impact on the performance of the Redis server, so it should be used at an appropriate time.

2. Use Redis commands (use with caution in production environments) : Redis provides some commands to view the information and size of all keys, for example, KEYScommands can list all key names, TYPEcommands can view the type of key values, and STRLENcommands can view string key values The length of LLENthe command can view the length of the list key value, etc. These commands can be used to discover Big Keys, for example, the length of a string key value exceeds the threshold, the length of a list key value exceeds the threshold, and so on.

3. Use Redis tools: Redis provides some tools to help discover Big Key, for example, redis-rdb-toolsthe tool can parse the RDB file, list the information and size of all key values, redis-clithe tool can connect to the Redis server, and execute some commands, such as viewing all keys Value information and size etc. Big Key can be more easily discovered through these tools.

4. Use third-party tools: In addition to the built-in tools of Redis, there are some third-party tools that can help discover Big Keys. For example, the tool officially recommended by Redis can bigkeysscan the Redis database, find all Big Keys, and sort them by size.

4. Why the keys * command should be used with caution in the redis production environment

In Redis, keysa command is a command for pattern matching key names, which can search for matching key names according to a specified pattern, thereby realizing fast retrieval of data. In actual development, we may need to use keyscommands to query all key names. However, in the Redis production environment, we need to use keys *commands with caution for the following reasons:

  1. Performance issues: keys *The command will traverse the entire Redis database and load all key names into the memory, which may consume a lot of system resources, cause the performance of the Redis server to degrade, and even cause the Redis server to go down.

  2. Blocking problem: When Redis executes keys *a command, it will block the command requests of all other clients until keys *the command execution is completed, which may cause the requests of other clients to be blocked, thereby affecting the response speed and stability of the system.

If you need to query the key names of a specific prefix, you can use the Redis SCANcommand, which can iterate the key names in the database in batches, avoiding loading all the key names at once, thereby improving performance and stability. For example, you can use the following command to query user:all key names prefixed with :

SCAN 0 MATCH user:*

This command will return a cursor and a batch of matching keys, and you can get the next batch of matching keys according to the value of the cursor. It should be noted that SCANwhen using commands to query key names, some key names may be repeated or missed, so certain processing and deduplication operations are required.

In addition to using SCANcommands, you can also set indexes in the Redis data structure or use appropriate commands to query data to improve retrieval efficiency and accuracy. For example, when using the Hash data structure of Redis, you can use HGETALLcommands to query all fields and values ​​instead of querying all key names.

5. How to deal with the centralized expiration of a large number of keys

In Redis, when a key expires, Redis will automatically delete it from the database. If there are a large number of keys in a Redis database, and the expiration time of these keys is similar, when these keys expire, some problems may arise, for example:

  1. Memory usage problem: When a large number of keys expire, the memory occupied by these keys may not be released in time, causing the memory usage of the Redis server to continue to increase until the Redis server runs out of memory and crashes.

  2. Performance issues: When a large number of keys are expired, the Redis server needs to spend a lot of CPU time to process the deletion of these expired keys, which may cause the performance of the Redis server to degrade, thereby affecting the response speed and stability of the Redis server.

To avoid these problems, we can take some measures to mitigate the impact of expired keysets, such as:

  1. Reasonably set the expiration time of keys: When using Redis, it is necessary to set the expiration time of keys reasonably according to business requirements and system resource conditions, so as to avoid centralized expiration of a large number of keys. The expiration time can be adjusted according to factors such as data access patterns and business needs.

  2. Distributed expiration time: By randomly spreading the expiration time of keys, the problems caused by the concentration of expired keys can be avoided. When setting the expiration time of the key, a certain random factor can be added so that the expiration time is not exactly the same.

  3. Using the lazy-free feature of Redis allows Redis to delay releasing the memory used by keys in an asynchronous manner, thereby avoiding blocking the main thread and improving the performance and stability of the Redis server.

6. Use batch operations to reduce network transfers

Redis supports batch operation commands. By sending multiple command requests to the Redis server at one time, the number of network transmissions can be reduced, thereby improving the performance and efficiency of the Redis server. Common batch operation commands are:

  1. MGET and MSET commands: The MGET command can get the values ​​of multiple keys at one time, and the MSET command can set the values ​​of multiple keys at one time. For example, the following command can get the values ​​of keys a, b, and c at once:
    MGET a b c
    
  2. DEL command: The DEL command can delete multiple keys at once. For example, the following command deletes keys a, b, and c at once:
    DEL a b c
    
  3. PIPES command: The PIPELINE command can package and send multiple commands to the Redis server, reducing the number of network transmissions, and can improve the performance of the Redis server by obtaining the return results of multiple commands at one time. For example, the following command can execute multiple commands at the same time:
    REDISCLI> PIPELINE
    REDISCLI> SET a 1
    REDISCLI> INCR b
    REDISCLI> GET c
    REDISCLI> EXEC
    

    It should be noted that when using batch operation commands, it is necessary to reasonably set the command parameters and adjust the configuration and parameters of the Redis server according to the actual situation to achieve the best performance and stability.

7. Cache penetration

Cache penetration refers to querying a non-existent data. Since there is no data in the cache, the query request bypasses the cache and directly queries the database, resulting in excessive pressure on the database. The cache penetration problem is caused by factors such as malicious attacks, unreasonable expiration time of cached data, and uneven distribution of business data.

Malicious attacks are a common form of cache penetration. Attackers deliberately send non-existent query requests in order to consume database resources. For example, an attacker can query the database bypassing the cache by adding special characters in query parameters or constructing malicious requests.

Unreasonable cache data expiration time is another common cause of cache penetration. If the data expiration time in the cache is set too short or not set at all, then the query request will bypass the cache and query the database after the data expires.

The uneven distribution of business data can also lead to the problem of cache penetration. For example, if some hot data is accessed very frequently while other data is accessed infrequently, only some of the hot data may be stored in the cache while other data is not cached. When the query request accesses data that is not in the cache, the cache will be bypassed to query the database, resulting in excessive pressure on the database.

In order to solve the problem of cache penetration, the following measures can be taken:

  1. Bloom Filter: Bloom filter is used to filter query requests, which can effectively identify whether query requests are legitimate. Bloom filter is a hash-based data structure that can quickly determine whether an element is in a set, and has the characteristics of high efficiency, fast, and low storage.

  2. Cache empty objects: Cache some empty objects in the cache. When a query request accesses data that does not exist, return the empty object in the cache to avoid query requests bypassing the cache and directly accessing the database.

  3. Set a reasonable cache expiration time: Setting a reasonable cache expiration time can reduce the problem of cache penetration. Generally speaking, the cache expiration time should match the access frequency of business data to ensure that the data stored in the cache is all hot data.

  4. Limiting malicious requests: Limiting the access frequency and query parameters of query requests can effectively prevent malicious attacks and cache penetration problems.

8. Cache breakdown

Cache breakdown refers to the fact that under high concurrent access, after the cache of a certain hot data expires, a large number of requests flood into the database, causing a sudden surge in pressure on the database, which seriously affects the stability and performance of the system. Cache breakdown is usually due to the very high access frequency of hot data, resulting in a short expiration time for the data in the cache and a large number of query requests, which leads to the invalidation of the data in the cache at the same time, causing a large number of requests to bypass the cache and directly query database.

The solutions to cache breakdown generally include the following:

  1. Locking: When the cache becomes invalid, distributed locks can be used to serialize requests, allowing only one request to access the database, while other requests wait for the results to be returned.

  2. Current limiting: When the access frequency of hotspot data is very high, current limiting can be used to control the number of concurrent requests and avoid the influx of a large number of requests in an instant.

  3. Data preheating: preload hot data into the cache, and quickly obtain data from the cache when the cache fails, avoiding a large number of requests from bypassing the cache to directly query the database.

  4. Delayed cache loading: When the cache fails, instead of querying the data in the database immediately, it waits for a period of time before querying. If there is the same query request during the period, the data in the cache is returned directly.

  5. Use secondary cache: store data in multi-level cache at the same time, for example, store hot data in memory cache and distributed cache. When memory cache fails, data can be obtained from distributed cache to avoid a large number of requests bypassing cache Query the database directly.

  6. Set hotspot data to never expire.

9. Cache Avalanche

Cache avalanche refers to the fact that under high concurrent access, a large amount of data in the cache fails at the same time or the cache service is unavailable, causing a large number of requests to flood into the database, resulting in a sudden surge of pressure on the database, seriously affecting the stability and performance of the system. Cache avalanches are usually caused by factors such as cache server downtime and cached data expiring at the same time.

Different from cache breakdown, cache avalanche is caused by a large amount of data in the cache being invalidated at the same time or the cache service is unavailable, rather than being caused by a very high access frequency of a certain hot data.

The solutions to cache avalanche generally have the following types:

  1. Data distribution: Evenly distribute the data in the cache to different servers to avoid problems caused by a cache server downtime or data failure at the same time.

  2. Current limiting: In the case of cache failure, the current limiting method is adopted to control the number of concurrent requests and avoid the influx of a large number of requests in an instant.

  3. Backup cache: Back up the data in the cache to another cache server or local file system. When the cache fails or the service is unavailable, you can quickly restore the data from the backup, avoiding a large number of requests to bypass the cache and directly query the database.

  4. Service downgrade: When the cache fails or the service is unavailable, some functions or interfaces can be temporarily shielded through service downgrade to ensure the normal operation of core functions.

  5. Increase the randomness of the cache validity period: increase the random validity period in the cache to avoid the problem of cache avalanche caused by a large number of cache invalidation at the same time.

10. Cache pollution (or full)

The cache pollution problem refers to some data in the cache that will only be accessed once or a few times. After being accessed, it will never be accessed again, but this part of the data still remains in the cache and consumes cache space.

Cache pollution will gradually appear as the data continues to increase. As the service continues to run, there will be a large amount of data in the cache that will never be accessed again. The cache space is limited. If the cache space is full, there will be additional overhead when writing data into the cache, which will affect the performance of Redis. This part of additional overhead mainly refers to judging the elimination strategy when writing, selecting the data to be eliminated according to the elimination strategy, and then performing the deletion operation.

The solutions generally have the following types:

1. Set the maximum cache size

The design choice of the system is a process of trade-offs: a large-capacity cache can bring performance acceleration benefits, but the cost will be higher, and a small-capacity cache does not necessarily have the effect of accelerating access. Generally speaking, I would recommend setting the cache capacity to 15% to 30% of the total data volume, taking into account access performance and memory space overhead.

For Redis, once you determine the maximum size of the cache, such as 4GB, you can use the following command to set the size of the cache:

However, it is inevitable that the cache will be filled, so a data elimination strategy is required.

CONFIG SET maxmemory 4gb

 2. Use a cache elimination strategy

11. Redis supports a total of eight elimination strategies

Redis supports a total of eight elimination strategies, namely noeviction, volatile-random, volatile-ttl, volatile-lru, volatile-lfu, allkeys-lru, allkeys-random and allkeys-lfu strategies.

How do you understand it? Mainly divided into three categories:

not eliminated

  • noeviction (default after v4.0) does not eliminate, when the memory is insufficient, the new write operation will report an error.

Eliminate data with an expiration time set

  • Random: volatile-random, from all keys, a part of keys are randomly eliminated. This strategy is suitable for the situation where the data stored in Redis has no distinction of importance, and random elimination of some keys will not have much impact on the data.
  • ttl: volatile-ttl, from the keys with the expiration time set, the key with the shortest remaining time is eliminated. This strategy is suitable for the situation where the data stored in Redis has important distinctions, and the data with a short expiration time is prioritized for elimination.
  • lru: volatile-lru, from all keys, use the LRU algorithm to eliminate, that is, eliminate the least recently used key. This strategy is suitable for situations where the data stored in Redis has no importance distinction.
  • lfu: volatile-lfu, from the keys with expiration time set, use the LFU algorithm to eliminate, that is, eliminate the keys with the least frequency of use. This strategy is suitable for the situation where the data stored in Redis has important distinctions, and the data with low frequency of use is preferentially eliminated.

Eliminate all data

  • Random: allkeys-random, from all keys, randomly eliminate some keys. This strategy is suitable for the situation where the data stored in Redis has no distinction of importance, and random elimination of some keys will not have much impact on the data.
  • lru: allkeys-lru, from all keys, use the LRU algorithm to eliminate, that is, eliminate the least recently used key. This strategy is suitable for situations where the data stored in Redis has no importance distinction.
  • lfu: allkeys-lfu, from all the keys, use the LFU algorithm to eliminate, that is, eliminate the keys with the least frequency. This strategy is suitable for situations where the data stored in Redis has no importance distinction.

12. Database and cache consistency

One of the most common problems when using a cache is the consistency problem between the cache and the database. Since the cache and the database are two independent systems, when the data is updated or deleted, the data in the cache may expire or become invalid, resulting in inconsistency between the data in the cache and the data in the database. This can cause problems such as:

  1. The problem of dirty data: the data in the cache has expired or expired, but the client still gets the data from the cache. This may cause errors or incorrect results.

  2. The problem of data loss: When data in the database is deleted, the data in the cache still exists, which causes the data in the cache to become inconsistent.

In order to solve these problems, it is necessary to ensure that when data in the database is updated or deleted, the corresponding cached data is also updated or deleted. The following strategies can be used to ensure consistency between the database and the cache:

1. Delayed double deletion scheme

This is a simple and effective solution. Its core idea is to delete the data in the cache first when updating the database, then update the database, and finally delete the data in the cache again. Doing this ensures that when the cache is invalidated, no old data will be read from the database.

However, since it takes time to delete the cache and update the database, it is necessary to add an appropriate waiting time in the code to ensure that the database update is completed before the cache becomes invalid. Furthermore, this scheme may fail if the cache has been rewritten by other operations during the wait time.

2. Update the cache scheme

The core idea of ​​this solution is to update the database first, and then update the data in the cache when updating the database. This ensures that the data in the cache is always up to date.

However, this solution faces a problem: when multiple requests update the same data at the same time, dirty data may appear in the cache. To solve this problem, distributed locks or optimistic locks can be introduced to avoid concurrent updates.

3. Double write consistency scheme

The core idea of ​​this solution is to update the data in the cache at the same time as the database is updated to ensure that the data in the cache and the data in the database are always consistent.

However, this solution will face a problem: if updating the database fails and the data in the cache has been updated, the data in the cache will become dirty data. In order to solve this problem, transactions can be used to ensure data consistency. If updating the database fails, the transaction is rolled back, and the data in the cache is also rolled back. In addition, you can also use the read-after-write consistency scheme to update the database first, and then update the cache to ensure that the data in the cache must be up-to-date. If the read cached data has expired, read the data from the database again.

4. Use message queues

By putting the data update operation into the message queue, the database is updated first, and then the cache is updated asynchronously through the message queue to ensure that the cache and database data are consistent. Highly reliable and high-performance asynchronous updates can be achieved through message queues, while avoiding direct interaction between the cache and the database. It should be noted that the use of message queues will increase the complexity and delay of the system, which needs to be weighed in actual situations.

Guess you like

Origin blog.csdn.net/qq_33129875/article/details/129468452