Redis KEYS command can not be used indiscriminately, SCAN can be used instead of online

KEYS command

Time complexity: O(N), assuming that the key name in Redis and the length of the given pattern are limited, N is the number of keys in the database.

The Redis Keys command is used to find all keys that match a given pattern

Although the time complexity of this operation is O(N), the constant time is quite low. For example, running Redis on an ordinary laptop will only take 40 milliseconds to scan 1 million keys.

Command format KEYS pattern

Warning: Use the KEYS command in a production environment to be very careful. Executing commands on large databases can affect performance. This command is suitable for debugging and special operations, like changing the keyspace layout. Don't use KEYS in your code. If you need a subset of keys in the key space, consider using SCAN or sets.

Supported matching patterns:

  • h?llo matches hello, hallo and hxllo

  • h*llo matches hllo and heeeello

  • h[ae]llo matches hello and hallo, but not hillo

  • h[^e]llo matches hallo, hbllo,… does not match hello

  • h[ab]llo matches hallo and hbllo

Use \ to escape the special characters you want to match.

background

1. Redis is single-threaded, all its operations are atomic, and no data exceptions due to concurrency

2. Using time-consuming Redis commands is very dangerous, it will take up a lot of processing time of only one thread, and cause all requests to be slowed down

Scenes

    When the KEYS command is executed in a production environment, because Redis is single-threaded, the performance of the KEYS command becomes slower and slower with the increase of database data. When the KEYS command is used, it will take up a large amount of processing time in a single thread, causing Redis Blocking and increasing Redis's CPU occupancy will cause all requests to be slowed down, which may cause the server where Redis is located down. The situation is very bad. Please ignore it in the actual production and application process. Imagine that if Redis is blocked for more than 10 seconds, if there is a cluster scenario, it may cause the cluster to judge that Redis has failed and perform failover.

    If all the threads can't get data from Redis, it may cause an avalanche in the application when the situation is serious. If all the threads go to the database to fetch the data in an instant, the database will go down.

Other dangerous orders

Whenever you find a command with a time complexity of O(N), you must be careful not to use it casually in production. For example, hgetall, lrange, smembers, zrange, sinter and other commands are not unusable, but the time complexity of these commands is O(N). To use these commands, you need to specify the value of N, otherwise cache crashes will occur .

1. The flushdb command is used to clear all keys in the current database

2. The fluxhall command is used to clear the data of the entire Redis server (delete all keys of all databases)

3. After the config client is connected, the server can be configured

How to disable dangerous commands

In redis.conf, in the SECURITY item, add the following configuration to disable the specified command:

rename-command FLUSHALL ""
rename-command FLUSHDB  ""
rename-command CONFIG   ""
rename-command KEYS ""

In addition, for the fluxhall command, you need to set appendonly no in the configuration file, otherwise the server cannot be started.

If you want to keep the command, but not easy to use, you can rename the command to set:

rename-command CONFIG b840fc02d524045429941cc15f59e41cb7be6c52

TIP: Changing the name of the command recorded to the AOF file or transferred to the slave server may cause problems.

Suggestions for improvement

1. If there is such a demand, you can index the key values ​​yourself, such as storing various key values ​​in different sets, and indexing by classification, so that the data can be obtained quickly, but this also has an obvious disadvantage , Is a waste of precious space, so we still have to consider it reasonably, of course, we can also find ways, such as regular key values, you can store their beginning and end values ​​and so on.

2. The scan command can also be used for the improved keys and smembers commands

SCAN command

Redis supports the scan command since version 2.8, which can be used to scan Redis records in batches. This will definitely cause the total time consumed by the entire query to become larger, affecting service use, but will not affect the redis service freeze, the basic usage of the SCAN command as follows:

Command format SCAN cursor [MATCH pattern] [COUNT count]

The SCAN command provides three parameters, the first is the cursor, the second is the regularity to be matched, and the third is the slot for a single traversal

The SCAN command and its related SSCAN, HSCAN, and ZSCAN commands are used to incrementally iterate a collection of elements:

  • The SCAN command is used to iterate the database keys in the current database.

  • The SSCAN command is used to iterate the elements in the collection key.

  • The HSCAN command is used to iterate the key-value pairs in the hash key.

  • The ZSCAN command is used to iterate the elements in an ordered set (including element members and element scores).

The four commands listed above all support incremental iteration, and they will only return a small number of elements each time they are executed, so these commands can be used in a production environment without the problems caused by the KEYS command and the SMEMBERS command—— When the KEYS command is used to process a large database, or the SMEMBERS command is used to process a large set of keys, they may block the server for several seconds.

However, incremental iteration commands are not without disadvantages: for example, the SMEMBERS command can return all the elements currently contained in the set key, but for incremental iteration commands such as SCAN, because the key is incremented The key may be modified in the process of the type iteration, so the incremental iteration command can only provide limited guarantees about the returned elements (offer limited guarantees about the returned elements).

Because the four commands of SCAN, SSCAN, HSCAN and ZSCAN all work very similarly, we will introduce these four commands together, but remember:

  • The first parameter of the SSCAN command, HSCAN command and ZSCAN command is always a database key.

  • The SCAN command does not need to provide any database keys in the first parameter-because it iterates all the database keys in the current database.

Basic usage of SCAN command

The SCAN command is a cursor based iterator: after each call of the SCAN command, a new cursor will be returned to the user. The user needs to use this new cursor as the cursor parameter of the SCAN command in the next iteration. This will continue the previous iteration process.

When the cursor parameter of the SCAN command is set to 0, the server will start a new iteration, and when the server returns a cursor with a value of 0 to the user, it means that the iteration has ended.

The following is an example of the iterative process of the SCAN command:

redis 127.0.0.1:6379> scan 0
1) "17"
2)  1) "key:12"
    2) "key:8"
    3) "key:4"
    4) "key:14"
    5) "key:16"
    6) "key:17"
    7) "key:15"
    8) "key:10"
    9) "key:3"
    10) "key:7"
    11) "key:1"

redis 127.0.0.1:6379> scan 17
1) "0"
2) 1) "key:5"
   2) "key:18"
   3) "key:0"
   4) "key:2"
   5) "key:19"
   6) "key:13"
   7) "key:6"
   8) "key:9"
   9) "key:11"

In the above example, the first iteration uses 0 as the cursor, which means to start a new iteration.

The second iteration uses the cursor returned in the first iteration, that is, the command returns the value of the first element-17.

As you can see from the above example, the reply of the SCAN command is an array containing two elements. The first array element is a new cursor for the next iteration, and the second array element is an array, this array Contains all the elements being iterated.

In the second call of the SCAN command, the command returned cursor 0, which means that the iteration has ended and the entire collection has been completely traversed.

Start a new iteration with 0 as the cursor, and keep calling the SCAN command until the command returns cursor 0. We call this process a full iteration.

SCAN command guarantees (guarantees)

The SCAN command, and other incremental iterative commands, in the case of a complete traversal, can provide users with the following guarantee: from the beginning of the complete traversal to the end of the complete traversal, all elements that have always existed in the data set will be completely traversed and returned ; This means that if there is an element that exists in the traversed data set from the beginning of the traversal to the end of the traversal, the SCAN command will always return this element to the user in a certain iteration.

However, because incremental commands only use cursors to record the iteration status, these commands have the following disadvantages:

  • The same element may be returned multiple times. The task of handling duplicate elements is left to the application program. For example, you can consider using the elements returned by the iteration only for operations that can be safely repeated multiple times.

  • If an element is added to the data set during the iteration process, or deleted from the data set during the iteration process, then this element may or may not be returned, which is undefined (undefined) .

The number of elements returned by the SCAN command per execution

Incremental iteration commands do not guarantee that each execution will return a given number of elements.

Incremental commands may even return zero elements, but as long as the cursor returned by the command is not 0, the application should not treat the iteration as an end.

However, the number of elements returned by the command always conforms to certain rules. In practice:

  • For a large data set, the incremental iteration command may return at most dozens of elements each time

  • For a sufficiently small data set, if the underlying representation of the data set is an encoded data structure (applicable to small collection keys, small hash keys, and small ordered collection keys), then incremental iteration commands All elements in the data set will be returned in one call.

Finally, the user can specify the maximum value of elements returned in each iteration through the COUNT option provided by the incremental iteration command.

COUNT option

Although the incremental iteration command does not guarantee the number of elements returned per iteration, we can use the COUNT option to adjust the behavior of the command to a certain extent.

Basically, the function of the COUNT option is to let the user tell the iteration command how many elements should be returned from the data set in each iteration.

Although the COUNT option is only a hint for incremental iterative commands, in most cases, this hint is effective.

  • The default value of the COUNT parameter is 10.

  • When iterating a large enough database, collection key, hash key or ordered collection key implemented by a hash table, if the user does not use the MATCH option, the number of elements returned by the command is usually the same as that specified by the COUNT option, or Slightly more than the number specified by the COUNT option.

  • When iterating a coded as an integer set (intset, a small set composed only of integer values), or a compressed list (ziplist, a small hash composed of different values ​​or a small ordered set), incremental The iteration command usually ignores the value specified by the COUNT option and returns all the elements contained in the data set to the user in the first iteration.

The same COUNT value is not used for every iteration

Users can change the value of COUNT as needed in each iteration, as long as they remember to use the cursor returned from the previous iteration to the next iteration.

MATCH option

Like the KEYS command, the incremental iteration command can also provide a glob-style pattern parameter, so that the command returns only the elements that match the given pattern. This can be achieved by passing the specified MATCH parameter to achieve.

The following is an example of using the MATCH option for iteration:

redis 127.0.0.1:6379> sadd myset 1 2 3 foo foobar feelsgood
(integer) 6

redis 127.0.0.1:6379> sscan myset 0 match f*
1) "0"
2) 1) "foo"
   2) "feelsgood"
   3) "foobar"

It should be noted that the pattern matching of elements is performed during the period of time before the elements are returned to the client after the command is taken from the data set. Therefore, if only a few elements in the iterated data set match the pattern, Then the iteration command may not return any elements in multiple executions.

The following is an example of this situation:

redis 127.0.0.1:6379> scan 0 MATCH *11*
1) "288"
2) 1) "key:911"

redis 127.0.0.1:6379> scan 288 MATCH *11*
1) "224"
2) (empty list or set)

redis 127.0.0.1:6379> scan 224 MATCH *11*
1) "80"
2) (empty list or set)

redis 127.0.0.1:6379> scan 80 MATCH *11*
1) "176"
2) (empty list or set)

redis 127.0.0.1:6379> scan 176 MATCH *11* COUNT 1000
1) "0"
2)  1) "key:611"
    2) "key:711"
    3) "key:118"
    4) "key:117"
    5) "key:311"
    6) "key:112"
    7) "key:111"
    8) "key:110"
    9) "key:113"
   10) "key:211"
   11) "key:411"
   12) "key:115"
   13) "key:116"
   14) "key:114"
   15) "key:119"
   16) "key:811"
   17) "key:511"
   18) "key:11"

As you can see, most of the above iterations do not return any elements.

In the last iteration, we force the command to scan more elements for this iteration by setting the parameter of the COUNT option to 1000, so that the command returns more elements.

Perform multiple iterations concurrently

At the same time, there can be any number of clients iterating on the same data set. The client needs to pass in a cursor for each iteration, and obtain a new cursor after the iteration, and this cursor contains the iterative All states, therefore, the server does not need to record any state for the iteration.

Stop iterating midway

Because all states of the iteration are stored in the cursor, and the server does not need to save any state for the iteration, the client can stop an iteration in the middle without any notification to the server.

Even if any number of iterations stop halfway, it will not cause any problems.

Use the wrong cursor for incremental iteration

Using broken, negative, out of range, or other abnormal cursors to perform incremental iterations will not cause the server to crash, but may cause undefined behavior of the command.

Undefined behavior means that the guarantee made by the incremental command to the return value may no longer be true.

Only two types of cursors are legal:

1. When starting a new iteration, the cursor must be 0.

2. A cursor used to continue the iterative process returned after the incremental iteration command is executed.

Guarantee of the end of iteration

The algorithm used by the incremental iteration command only guarantees that the iteration will stop when the size of the data set is bounded. In other words, if the size of the iterated data set continues to grow, incremental iteration The command may never complete a complete iteration.

It can be seen intuitively that when a data set keeps getting bigger, you need to do more and more work if you want to access all the elements in the data set. Whether you can end an iteration depends on whether the user performs the iteration faster than The data set grows faster.

time complexity:

    The complexity of each execution of the incremental iteration command is O(1), and the complexity of a complete iteration of the data set is O(N), where N is the number of elements in the data set.

return value:

    The SCAN command, SSCAN command, HSCAN command and ZSCAN command all return a multi-bulk containing two elements. Reply: The first element of the reply is an unsigned 64-bit integer (cursor) represented by a string, and the second element of the reply It is another multi-bulk reply, this multi-bulk reply contains the elements that are iterated this time.

    Each element returned by the SCAN command is a database key.

    Each element returned by the SSCAN command is a set member.

    Each element returned by the HSCAN command is a key-value pair, and a key-value pair consists of a key and a value.

    Each element returned by the ZSCAN command is an ordered set element, and an ordered set element is composed of a member and a score.

For more usage, please see Redis SCAN command:

http://doc.redisfans.com/key/scan.html

 

reference:

https://redis.io

https://www.dazhuanlan.com/2019/12/17/5df832fd189f6

Guess you like

Origin blog.csdn.net/JineD/article/details/111282163