Use of redis SCAN

Use of Redis SCAN

SCAN

Sometimes it is necessary to find out the key list of a specific prefix from the tens of thousands of keys in the Redis instance to manually process the data, which may be to modify its value or delete the key. Here is a problem, how to find out the list of keys that satisfy a specific prefix from a large number of keys?

Redis provides a simple brute force command keys to list all keys that satisfy a specific regular string rule.

!redis-cli keys key67*
  1) "key6764"
  2) "key6738"
  3) "key6774"
  4) "key673"
  5) "key6710"
  6) "key6759"
  7) "key6715"
  8) "key6746"
  9) "key6796"

This instruction is very simple to use, just provide a simple regular string, but it has two obvious disadvantages.

  • There are no offset and limit parameters, and all the keys that meet the conditions are spit out at one time. In case there are hundreds of w keys that meet the conditions in the instance, you will know that it is uncomfortable when you see that the strings on the screen are endless.

  • The keys algorithm is a traversal algorithm with a complexity of O(n). If there are tens of millions of keys in the instance, this instruction will cause the Redis service to freeze, and all other instructions for reading and writing Redis will be delayed or even timed out. An error is reported, because Redis is a single-threaded program that executes all instructions sequentially, and other instructions must wait until the current keys instruction is executed before continuing.

  • It is recommended to block the keys command in the production environment

In order to solve this problem, Redis added the command - scan in version 2.8. Compared with keys, scan has the following characteristics:

复杂度虽然也是 O(n),但是它是通过游标分步进行的,不会阻塞线程;

提供 limit 参数,可以控制每次返回结果的最大条数,limit 只是对增量式迭代命令的一种提示(hint),返回的结果可多可少;

同 keys 一样,它也提供模式匹配功能;

服务器不需要为游标保存状态,游标的唯一状态就是 scan 返回给客户端的游标整数;

返回的结果可能会有重复,需要客户端去重复,这点非常重要;

遍历的过程中如果有数据修改,改动后的数据能不能遍历到是不确定的;

单次返回的结果是空的并不意味着遍历结束,而要看返回的游标值是否为零

scan basic use

SCAN cursor [MATCH pattern] [COUNT count]

    初始执行scan命令例如scan 0。SCAN命令是一个基于游标的迭代器。

    这意味着命令每次被调用都需要使用上一次这个调用返回的游标作为该次调用的游标参数,以此来延续之前的迭代过程。

    当SCAN命令的游标参数被设置为0时,服务器将开始一次新的迭代,而当redis服务器向用户返回值为0的游标时,表示迭代已结束,这是唯一迭代结束的判定方式,而不能通过返回结果集是否为空判断迭代结束。

    scan 参数提供了三个参数,第一个是 cursor 整数值,第二个是 key 的正则模式,第三个是遍历的 limit hint。

    第一次遍历时,cursor 值为 0,然后将返回结果中第一个整数值作为下一次遍历的 cursor。一直遍历到返回的 cursor 值为 0 时结束。
!redis-cli scan 0 match key99* count 1000
1) "13912"
2)  1) "key997"
    2) "key9906"
    3) "key9957"
    4) "key9902"
    5) "key9971"
    6) "key9935"
    7) "key9958"
    8) "key9928"
    9) "key9931"
   10) "key9961"
   11) "key9948"
   12) "key9965"
   13) "key9937"
!redis-cli scan 13912 match key99* count 1000
1) "5292"
2)  1) "key996"
  2) "key9960"
  3) "key9973"
  4) "key9978"
  5) "key9927"
  6) "key995"
  7) "key9992"
  8) "key9993"
  9) "key9964"
 10) "key9934"

The returned result is divided into two parts: the first part (1) is the cursor for the next iteration, and the second part (2) is the result set of this iteration.

From the above process, we can see that although the limit provided is 1000, only about 10 results are returned.

Because this limit does not limit the number of returned results, but limits the number of dictionary slots (approximately equal to) that the server traverses in a single pass.

If you set the limit to 10, you will find that the returned result is empty, but the cursor value is not zero, which means that the traversal has not yet ended.

!redis-cli scan 0 match key99* count 10
1) "15360" 
2) (empty list or set)
 
!redis-cli scan 15360 match key99* count 10
1) "2304"
2) (empty list or set)

More scan commands

The scan command is a series of commands, in addition to traversing all keys, it can also traverse the specified container collection.

zscan 遍历 zset 集合元素,

hscan 遍历 hash 字典的元素、

sscan 遍历 set 集合的元素。

important point:

The first argument to the SSCAN command, the HSCAN command, and the ZSCAN command is always a database key. The SCAN command, on the other hand, does not need to provide any database keys in the first parameter - because it iterates all database keys in the current database.

Large key scanning
Sometimes due to improper use by business personnel, large objects will be formed in the Redis instance, such as a large hash and a large zset, which often appear.

Such an object poses a big problem to Redis cluster data migration, because in a cluster environment, if a key is too large, the data will cause the migration to freeze.

In addition, in terms of memory allocation, if a key is too large, when it needs to be expanded, it will apply for a larger piece of memory at one time, which will also cause lag.

If this large key is deleted, the memory will be recovered at one time, and the freeze phenomenon will occur again.

In normal business development, try to avoid the generation of large keys.

If you observe that the memory of Redis fluctuates greatly, it is most likely caused by a large key. At this time, you need to locate the specific key.

Further locate the specific business source, and then improve the relevant business code design.

So how to locate the big key?

In order to avoid lagging on the online Redis, the scan command is used. For each key scanned, use the type command to obtain the type of the key.

Then use the size or len method of the corresponding data structure to get its size. For each type, the top N names of the reserved size are displayed as the scanning result.

The above process needs to be scripted, which is cumbersome, but Redis has officially provided such a scanning function in the redis-cli command, and we can use it directly.

!redis-cli  --bigkeys
# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).

[00.00%] Biggest string found so far 'key316' with 3 bytes
[00.00%] Biggest string found so far 'key7806' with 4 bytes
[12.79%] Biggest zset   found so far 'salary' with 1 members
[13.19%] Biggest string found so far 'counter:__rand_int__' with 6 bytes
[13.50%] Biggest hash   found so far 'websit' with 2 fields
[14.37%] Biggest set    found so far 'bbs' with 3 members
[14.67%] Biggest hash   found so far 'website' with 3 fields
[30.41%] Biggest list   found so far 'mylist' with 100000 items
[95.53%] Biggest zset   found so far 'page_rank' with 3 members

-------- summary -------

Sampled 10019 keys in the keyspace!
Total key length in bytes is 68990 (avg len 6.89)

Biggest string found 'counter:__rand_int__' has 6 bytes
Biggest   list found 'mylist' has 100000 items
Biggest    set found 'bbs' has 3 members
Biggest   hash found 'website' has 3 fields
Biggest   zset found 'page_rank' has 3 members

10011 strings with 38919 bytes (99.92% of keys, avg size 3.89)
3 lists with 100003 items (00.03% of keys, avg size 33334.33)
1 sets with 3 members (00.01% of keys, avg size 3.00)
2 hashs with 5 fields (00.02% of keys, avg size 2.50)
2 zsets with 4 members (00.02% of keys, avg size 2.00)

If you are worried that this command will greatly increase the ops of Redis and cause an online alarm, you can also add a sleep parameter.

!redis-cli  --bigkeys -i 0.1
# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).

[00.00%] Biggest string found so far 'key316' with 3 bytes
[00.00%] Biggest string found so far 'key7806' with 4 bytes
[12.79%] Biggest zset   found so far 'salary' with 1 members
[13.19%] Biggest string found so far 'counter:__rand_int__' with 6 bytes
[13.50%] Biggest hash   found so far 'websit' with 2 fields
[14.37%] Biggest set    found so far 'bbs' with 3 members
[14.67%] Biggest hash   found so far 'website' with 3 fields
[30.41%] Biggest list   found so far 'mylist' with 100000 items
[95.53%] Biggest zset   found so far 'page_rank' with 3 members

-------- summary -------

Sampled 10019 keys in the keyspace!
Total key length in bytes is 68990 (avg len 6.89)

Biggest string found 'counter:__rand_int__' has 6 bytes
Biggest   list found 'mylist' has 100000 items
Biggest    set found 'bbs' has 3 members
Biggest   hash found 'website' has 3 fields
Biggest   zset found 'page_rank' has 3 members

10011 strings with 38919 bytes (99.92% of keys, avg size 3.89)
3 lists with 100003 items (00.03% of keys, avg size 33334.33)
1 sets with 3 members (00.01% of keys, avg size 3.00)
2 hashs with 5 fields (00.02% of keys, avg size 2.50)
2 zsets with 4 members (00.02% of keys, avg size 2.00)

The above instruction will sleep for 0.1s every 100 scan instructions, so the ops will not increase sharply, but the scan time will be longer. It should be noted that the maximum obtained by this bigkeys is not necessarily the maximum.

Before explaining the reason, first explain the principle of bigkeys. It is very simple. Through the scan command to traverse, the keys of various data structures can be obtained through different commands:

  • If it is a string structure, it is judged by strlen;
  • If it is a list structure, it is judged by llen;
  • If it is a hash structure, it is judged by hlen;
  • If it is a set structure, it is judged by scard;
  • If it is a sorted set structure, it is judged by zcard.
    Because of this way of judging, although the string structure can definitely correctly filter out the most cache-occupied, it can also be said that the largest key.

But the list is not necessarily, for example, now there are two keys of the list type, namely: numberlist–[0,1,2], stringlist–[“123456789123456789”],

Due to the judgment by llen, the numberlist is larger than the stringlist.

In fact, stringlist takes up more memory. The other three data structures hash, set, and sorted set all have this problem.

Be sure to pay attention to this when using bigkeys.

slowlog command

It is mentioned above that the keys command cannot be used. If some developers do this, how do we know?

Like any other storage system such as mysql, mongodb can view the slow log, so can redis, that is, through the command slowlog.

The usage is as follows

SLOWLOG subcommand [argument]

The subcommands mainly include:

  • get, usage: slowlog get [argument], get the number of slow logs specified by the argument parameter.
  • len, usage: slowlog len, the total number of slow logs.
  • reset, usage: slowlog reset, clear the slow log.
!redis-cli slowlog get 5
1) 1) (integer) 2
   2) (integer) 1537786953
   3) (integer) 17980
   4) 1) "scan"
      2) "0"
      3) "match"
      4) "key99*"
      5) "count"
      6) "1000"
   5) "127.0.0.1:50129"
   6) ""
2) 1) (integer) 1
   2) (integer) 1537785886
   3) (integer) 39537
   4) 1) "keys"
      2) "*"
   5) "127.0.0.1:49701"
   6) ""
3) 1) (integer) 0
   2) (integer) 1537681701
   3) (integer) 18276
   4) 1) "ZADD"
      2) "page_rank"
      3) "10"
      4) "google.com"
   5) "127.0.0.1:52334"
   6) ""

How much time the command takes will be saved in the slowlog, which can be configured by the command config set slowlog-log-slower-than 2000 and does not require restarting redis.

Note : The unit is microseconds, 2000 microseconds is 2 milliseconds.

rename-command

In order to prevent the problem from being brought to the production environment, we can rename some dangerous commands through the configuration file,

For example, some high-risk commands such as keys. The operation is very simple,

Just add the following configuration to the conf configuration file:

rename-command flushdb flushddbb

rename-command flushall flushallall

rename-command keys keysys

Guess you like

Origin blog.csdn.net/T_LOYO/article/details/128905431