[Practice] Detailed Explanation of Redis Usage Specification List

Detailed Explanation of Redis Usage Specification List


insert image description here

0. Preface

Redis, as an open source, in-memory data structure storage system, is widely welcomed and used for its excellent performance and flexible data structure. However, while Redis is relatively intuitive to use, exploiting its full potential and optimizing its performance requires a solid understanding of how it works and best practices.

This blog collects online content to provide a detailed Redis usage specification. Whether you are a novice who is new to Redis or a veteran who wants to further improve Redis usage skills, you can find valuable information here. Help you avoid pitfalls encountered in actual use.
insert image description here

References

  1. Redis official documentation: https://redis.io/
  2. Redis Command Reference: https://redis.io/commands
  3. Redis Best Practices: https://redislabs.com/ebook/appendix-a/a-3-scripting-and-security/
  4. Redis combat (books)
  5. Redis design and implementation (books)

1. Key-value pair usage specification

1. Key naming convention

The key naming convention of Redis can be defined according to actual needs and personal preferences, but the following are some common naming conventions and best practices:

  1. Concise and clear: Key should be concise and clear, and can clearly express its meaning. Avoid using long or complex key names to reduce storage space and improve performance.

  2. Use namespaces: To avoid conflicts between keys, you can use namespaces to classify or group keys. For example, "namespace:key" forms such as "user:123" and "order:456" can be used.

  3. Use a colon to separate levels: Use a colon as a level separator to indicate a level relationship in Key naming. For example, you can use "category:books" and "category:movies" to represent data under different categories.

  4. Avoid using special characters: Avoid using special characters in key naming, such as spaces, tabs, or newlines, so as not to cause trouble in parsing or processing.

  5. Use a consistent naming style: Pick a naming style and keep it consistent across the application. For example, you can choose to use all lowercase letters, use underscores as word separators, or use camelCase.

  6. Readability and maintainability: Choose key names that are readable and maintainable so that other developers can easily understand and manipulate them. Avoid oversimplified or abbreviated key names unless really required in a specific case.

  7. Avoid naming conflicts: Make sure the Key name does not conflict with other naming in other data storage systems or databases to avoid confusion and errors.

In short, a good key naming should be concise, clear, readable and maintainable, and can clearly express its meaning and relationship. Depending on the specific situation, it can be named according to the above suggestions, or it can be customized according to the specifications of the team or project.

2. Avoid bigkeys

Indeed, it is good practice to avoid using "bigkey". In Redis, "bigkey" refers to a key that takes up a lot of memory space, which can negatively impact performance and resource consumption.

2.1. Problems caused by "bigkey"

  1. Memory consumption: "bigkey" takes up a lot of memory, especially when using Redis's RDB persistence or AOF log persistence. If a large amount of memory is used to store a single key, the memory of the Redis instance will be insufficient, affecting the storage and reading of other keys.

  2. Read and write performance: Read and write operations on "bigkey" require more time and resources. When a large key needs to be read or updated, Redis needs to spend more time serializing and deserializing the data, and passing more bytes in network transmission.

  3. Expiration processing: If a "bigkey" has an expiration time set, when the Key expires, Redis may block for a period of time to delete it, thus affecting the execution of other operations.

2.2 Avoid the "bigkey" problem solution

2.2 1. Data fragmentation

Split large data into smaller pieces and store them in multiple keys. This spreads the data evenly and reduces the size of a single key.
Yes, data sharding is a common method for splitting large data into smaller pieces and storing them in multiple keys. This can evenly disperse data and reduce the size of a single key, thereby improving Redis performance and resource utilization.

Guidelines for data sharding:

  1. Sharding strategy: It is very important to choose an appropriate sharding strategy. A common method is to use a hash function to map a unique identifier of data (such as ID or attribute) to different keys. This ensures that data with the same identity is always stored in the same shard.

  2. Number of shards: There is a trade-off when deciding on the number of shards. A smaller number of shards can reduce the complexity of management and maintenance, but it may cause some shards to be too large, and there is still a "bigkey" problem. A higher number of shards spreads the data more evenly, but adds some additional overhead and complexity.

  3. Shard mapping: maintain a shard mapping table to record the mapping relationship between the unique identifier of data and the corresponding shard. In this way, the correct shard can be quickly located according to the identifier, and read and operated.

  4. Consistent hashing: The consistent hashing algorithm is a commonly used data sharding algorithm, which can minimize the amount of data migration when increasing or decreasing shards. Consistent hashing can provide better load balancing and fault tolerance.

2.2.2. Data Compression

For large data, it can be compressed before storage to reduce memory usage and network transmission overhead. Redis provides some compression algorithms such as LZF and Snappy.
Indeed, using efficient serialization methods and compression methods can improve performance and save storage space when storing and transferring data in Redis. Here are some relevant specifications and recommendations:

3. Use efficient serialization methods and compression methods

3.2.1. Serialization method

Choosing an efficient serialization method converts data into a sequence of bytes and provides greater efficiency during storage and transmission. In Redis, common serialization methods include JSON, MessagePack, Protocol Buffers, etc. According to the characteristics and requirements of the data, select the appropriate serialization method. Different serialization methods have different effects in terms of serialization speed and memory space occupied by data serialization. For example, the two serialization methods, protostuff and kryo, are more efficient than Java's built-in serialization method (java-build-in-serializer).
Common Serialization Methods

  1. JSON is suitable for scenarios where the data structure is relatively simple and requires high readability and cross-platform compatibility.

  2. MessagePack is an efficient binary serialization format that can compactly serialize data into a sequence of bytes. It has high serialization and deserialization performance and supports multiple programming languages. The MessagePack serialization method in Redis can save storage space and improve performance, but is less readable. It is suitable for scenarios that require high storage space and transmission efficiency.

  3. Protocol Buffers is a language-independent, platform-independent serialization format developed by Google. It uses a descriptive Interface Definition Language (IDL) to define data structures, and generates a corresponding code library for efficient serialization and deserialization operations. Protocol
    Buffers can provide high performance and small serialization size in Redis, but using it requires an additional code generation step. Protocol
    Buffers are suitable for scenarios that have high requirements on performance and storage space.

Comparison of three serialization methods:

serialization method advantage shortcoming Applicable scene
JSON - Easy to understand and read
- Good cross-platform compatibility
- High serialization and storage overhead
- Better readability
The data structure is relatively simple and requires high readability and compatibility.
MessagePack - High serialization and deserialization performance
- Small storage space
- poor readability
- does not support all programming languages
Scenarios that require high storage space and transmission efficiency
Protocol Buffers - high performance
- small serialization size
- Requires an additional code generation step
- steep learning curve
Scenarios with high requirements on performance and storage space

You can choose a suitable serialization method according to the characteristics and requirements of the data. JSON is an option if readability and compatibility are key factors. If storage space and transmission efficiency are priorities, you can choose MessagePack. If you need high performance and small serialization size, and are willing to accept an extra code generation step, you can choose Protocol Buffers.

3.2.2. Compression method

For a large amount of repetitive or redundant data, using compression methods can reduce storage space and network transmission overhead. Redis provides some built-in compression algorithms, such as LZF compression algorithm and Snappy compression algorithm. The appropriate compression method can be selected according to the characteristics of the data.

When dealing with scenes that contain large amounts of repetitive or redundant data, using compression methods can significantly reduce storage space and network transmission overhead. Here are two examples of built-in compression algorithms that can be used in Redis:

1. LZF compression algorithm:
LZF is a fast lossless compression algorithm, suitable for scenarios that require high compression speed and low compression ratio. It consumes relatively little CPU resources during compression and decompression. The LZF algorithm is widely used in Redis and can be zstdenabled by turning on the option.
Assuming that a large amount of repetitive text data is stored in Redis, such as log messages, there may be a large number of repeated rows in these messages. Using the LZF compression algorithm can significantly reduce storage space and transmission overhead. Add the following dependencies to
the code sample pom.xml file:

<dependency>
    <groupId>redis.clients</groupId>
    <artifactId>jedis</artifactId>
    <version>2.9.0</version>
</dependency>
<dependency>
    <groupId>com.ning</groupId>
    <artifactId>compress-lzf</artifactId>
    <version>1.0.3</version>
</dependency>

Use LZFEncoder.encode()the method to compress the byte array, and then save the compressed byte array to Redis. Use LZFDecoder.decode()method to decompress it.

import redis.clients.jedis.Jedis;
import com.ning.compress.lzf.LZFEncoder;
import com.ning.compress.lzf.LZFDecoder;

public class RedisLZFExample {
    
    
    public static void main(String[] args) throws Exception {
    
    
        // Create a connection to Redis
        Jedis jedis = new Jedis("localhost");

        String str = "日志数据: This is a repeated message.";
        byte[] original = str.getBytes();

        // Compress the bytes
        byte[] compressed = LZFEncoder.encode(original);

        // Save the compressed data in Redis
        jedis.set("myKey".getBytes(), compressed);

        // Retrieve the compressed data from Redis
        byte[] retrieved = jedis.get("myKey".getBytes());

        // Decompress the bytes
        byte[] decompressed = LZFDecoder.decode(retrieved);

        // Convert the decompressed bytes back to a String
        String result = new String(decompressed, "UTF-8");

        System.out.println("Original:    " + str);
        System.out.println("Decompressed:" + result);

        jedis.close();
    }
}

2. Snappy compression algorithm:
Snappy is a fast compression algorithm, suitable for scenarios that require high compression ratio and low compression speed. It is relatively fast in the compression and decompression process, but compared to the LZF algorithm, the compression ratio is high. The Snappy algorithm is also available in Redis and can be snappyenabled by turning on the option.It is assumed that a large number of image files are stored in Redis, and these files may have similar image blocks or metadata. Using the Snappy compression algorithm can greatly reduce storage space and network transmission overhead

When choosing a compression method, it needs to be evaluated according to the characteristics of the data and the application requirements. If you have high requirements for compression speed and can tolerate a lower compression rate, you can choose the LZF algorithm. If you need a higher compression ratio and can accept a slightly slower compression speed, you can choose the Snappy algorithm.

It should be noted that compression algorithms are not suitable for all types of data. For data that is already highly compressed (such as data that has been used with other compression algorithms) or data with low redundancy, compression may not have a significant effect. Therefore, the characteristics of the data and the expected effect should be carefully evaluated before applying a compression algorithm.

4 Shared pool using integer objects

I don't think this specification makes much sense. Redis uses a technique called shared pooling of integer objects to optimize memory usage. The main idea of ​​this technique is to pre-create and store commonly used small integers in an internal pool. When these integers need to be created, they are directly obtained from the pool instead of creating new objects every time.

This technique is used in many Redis functions, for example in places like counters, hash table sizes, and list lengths.

In the Redis source code, the shared pool of integer objects is defined as an array with a size of 10,000. The index of the array is the corresponding integer value, and the elements of the array are the corresponding integer objects.

When an integer object is needed, Redis will first check whether the integer is within the range of the shared pool (0-9999). If it is within the range, it will be obtained directly from the shared pool, otherwise a new integer object will be created.

This technology is very helpful for memory optimization and reducing the number of object creation, but it is transparent to developers, developers do not need to care about this process, Redis will automatically handle it. This optimization technique is only applicable to small integers. For large integers or floating point numbers, Redis does not provide a shared pool mechanism.

2. Data storage specification

2.1. Use Redis to save hot data

Redis is an in-memory database with very fast read and write speeds, suitable for storing frequently accessed hot data. For example, user's session information, popular product information, etc. can be stored in Redis to improve the response speed of the system.

For example, for an e-commerce website, the sales data of popular products is hot data that often needs to be queried. These data are stored in Redis, and when users query the sales data of these commodities, they can be obtained directly from Redis without querying in the database, which greatly improves the response speed.

2.2. Different business data are stored by instance

In order to ensure data independence and security, different business data should be stored in different Redis instances. In this way, even if a problem occurs in a certain instance, other business data will not be affected.

For example, a company may have multiple business lines, such as user management, order management, commodity management, etc., and the data of each business line should be stored in different Redis instances. Data isolation of different business lines can reduce the risk of data leakage and avoid data confusion.

2.3. When saving data, set the expiration time

Redis data is stored in memory, if the amount of data is too large, it may cause memory overflow. Therefore, when saving data, you should set a reasonable expiration time to automatically clean up data that is no longer needed. At the same time, setting the expiration time can also prevent the data from expiring and ensure the real-time performance of the data.

For example, for the user's login information, we can set its expiration time in Redis to 30 minutes. In this way, if the user does not perform any operations within 30 minutes, Redis will automatically delete the data to free up memory. At the same time, it can also prevent the user's login information from being used by others

2.4. Control the capacity of Redis instance

Each Redis instance has its maximum memory capacity, beyond this capacity, Redis may have problems. Therefore, you should regularly monitor the memory usage of the Redis instance, clean up unnecessary data in time, and control the capacity of the Redis instance.
Setting the memory size of a Redis single instance between 2 and 6GB can ensure performance while avoiding excessive delays in RDB snapshots or master-slave cluster data synchronization. Such a setting can ensure that Redis will not be blocked due to data backup or synchronization when processing normal requests, thereby improving the overall performance and stability of the system.

For example, suppose our server memory is 16GB. In order to ensure the normal operation of Redis, we should set the maximum memory limit of the Redis instance below 12GB to prevent memory overflow. We should also regularly check the memory usage of the Redis instance. If the memory usage is close to the maximum limit, unnecessary data should be cleaned up in time.

3. Command usage specification

3.1 Disable some commands online

In the online environment, some Redis commands may affect the stability and performance of the system, so we should disable these commands. For example, we can disable the FLUSHDB and FLUSHALL commands to prevent data loss caused by misuse. In addition, we can also disable the DEBUG and CONFIG commands to prevent unauthorized users from modifying the system configuration.

3.2 Use the MONITOR command with caution

The MONITOR command can be used to monitor all requests of the Redis server in real time, but this command consumes a lot of CPU and network resources. Therefore, in an online environment, we should use the MONITOR command with caution. If we need to monitor the performance of the Redis server, we can use the INFO command or use a dedicated monitoring tool.

3.3 Use full operation commands with caution

Full operation commands, such as KEYS, SMEMBERS, etc., will return a large amount of data, which may cause network delays and high CPU usage. In the case of a large amount of data, these commands may block the Redis server and affect the execution of other commands. Therefore, we should use the full operation command with caution. If we need to obtain a large amount of data, we can use commands such as SCAN, SSCAN, etc. These commands can obtain data in batches to avoid blocking the server.

The following is a specification compiled by a big cow, and I have made some explanations.

3. Business level:

  1. The length of the key should be as short as possible: this is because the key in Redis is stored using the dictionary data structure, the size of the key will directly affect the memory usage of Redis, and a key that is too long will take up more memory space.

  2. Avoid bigkey: bigkey refers to a key whose size exceeds 10kb or the number of list, set, sorted set, or hash elements exceeds 5000. Redis is single-threaded, and processing bigkeys consumes a lot of CPU time, which will cause blocking of other commands.

  3. For version 4.0+, it is recommended to enable lazy-free: to optimize the memory recovery strategy. When deleting a large amount of data, Redis will perform the memory recovery operation in the background thread to avoid the main thread from being blocked.

  4. Use Redis as a cache and set an expiration time: this can prevent the memory from being filled with a large amount of data, causing the service to fail.

  5. Do not use commands with high complexity: These commands often have high time complexity, which will consume a lot of CPU time and affect the performance of Redis.

  6. Try not to query the entire amount of data at one time. It is recommended to write a large amount of data in multiple batches: this can prevent Redis from being blocked due to one-time operation of a large amount of data.

  7. For batch operations, it is recommended to replace GET/SET with MGET/MSET, and replace HGET/HSET with HMGET/HMSET: batch operations can reduce network latency and improve Redis performance.

  8. Do not use KEYS/FLUSHALL/FLUSHDB commands: These commands will block Redis and affect service performance.

  9. Avoid centralized expiration of keys: If a large number of keys expire at a certain time, it will cause Redis to block.

  10. Choose an appropriate elimination strategy according to the business scenario: such as volatile-lru (select the least-used key from the keys with an expiration time set for elimination), etc.

4. Operation and maintenance level:

  1. Deploy instances by business line: Avoid mixed deployment of multiple business lines, and when something goes wrong, the scope of impact is smaller.

  2. Ensure that the machine has sufficient CPU, memory, bandwidth, and disk resources: these are the basis for the normal operation of Redis.

  3. It is recommended to deploy master-slave clusters and distribute them on different machines: to ensure data stability and reduce the risk of single point of failure.

  4. The machines deployed by the master and slave nodes are independent, and cross-deployment should be avoided as much as possible: in order to avoid a problem with one node and affect other nodes.

  5. It is recommended to deploy sentinel clusters to achieve automatic failover: automatically switch between master and slave to ensure high availability of the system.

  6. Do a good job of capacity planning in advance to prevent out-of-memory caused by a sudden increase in the memory used by the instance when the master-slave is fully synchronized: avoid service failures caused by out-of-memory.

  7. Do a good job of machine CPU, memory, bandwidth, and disk monitoring: find problems in time to prevent service failures.

  8. Set the maximum number of connections for the instance: prevent too many client connections from causing excessive load on the instance and affecting service performance.

  9. The memory of a single instance is recommended to be controlled below 10G: large instances may be blocked during full master-slave synchronization and backup.

  10. Set a reasonable slowlog threshold and monitor it: Too many slowlogs can indicate a performance problem.

  11. Set a reasonable repl-backlog to reduce the probability of master-slave full synchronization: master-slave full synchronization often consumes a lot of resources and affects service performance.

  12. Set a reasonable slave client-output-buffer-limit to avoid interruption of master-slave replication.

  13. It is recommended to backup on the slave node without affecting the performance of the master node: backup tasks often consume a lot of resources and affect service performance.

  14. Do not enable AOF or configure AOF to refresh every second to avoid disk IO from slowing down Redis performance: AOF will increase disk IO and affect service performance.

  15. When adjusting maxmemory, pay attention to the adjustment order of the master-slave nodes. If the order is wrong, the master-slave data will be inconsistent: ensure data consistency.

  16. Deploy and monitor instances, and use long connections when collecting INFO information to avoid frequent short connections: frequent network connections consume a lot of resources and affect service performance.

  17. Do a good job in instance runtime monitoring, focusing on expired_keys, evicted_keys, and latest_fork_usec: these indicators reflect the operating status of Redis, and short-term sudden increases may cause blocking risks.

  18. When scanning online instances, remember to set the sleep time to avoid performance jitter caused by excessively high OPS: scanning operations often consume a lot of resources and affect service performance.

Guess you like

Origin blog.csdn.net/wangshuai6707/article/details/132679549