Scan and Keys commands of redis

background

  • We have a similar user center, in which millions of users are stored in redis with user_id + id number as the key. There is a requirement to use user_ as the prefix for matching query to match the key. When this operation command is performed, the service freezes and redis has some link timeouts. The problem found in the last row is the problem caused by too many keys found in the keys. The specific reason is from the principle of his order
  • The final solution is: use the scan command

Keys

Introduction

  1. Fuzzy matching can be performed by simple regularization, no paging, no cursor. It is brute force search and traversal.
  2. The advantage is convenience, and the only disadvantages. Redis is single-threaded, that is, if there are too many queries in my thread, which leads to a long query time, other threads will block or time out. The time complexity of the query is O(n)

Scan

Introduction

  1. The complexity of scan is O(n), and the cursor can be used for step-by-step query without blocking threads
  2. Fuzzy matching can be performed the same as keys, but every time you have to bring a returned cursor, you can use limit to limit the maximum number of entries, which may be less but not more than (http://doc.redisfans.com/key/scan .html#scan)
  3. The data returned by the cursor may be empty or more than one each time. As long as the returned cursor is not 0, it does not mean that the data is gone.
  4. But there is still a problem, that is, it is possible to return duplicate key values. This requires our application to perform deduplication. You can use set for storage or Map
    because the two data structures themselves cannot be repeated.

SCAN internal exploration

  1. The global redis uses key-value storage, and uses its underlying data structure dict dictionary. The internal storage of the dictionary is similar to the hashmap in java, and its bottom layer is realized through arrays and linked lists.

  2. The key we store in the dict is the subscript of the array below, and the following table of the array is obtained by calculating the hash value. It is precisely because of the hash conflict that there is a linked list.

  3. When using scan, the cursor of scan is the subscript of the array. Because the storage is performed after calculating the hash, it is not stored sequentially on the array, so there may or may not be a value in a section of the array. value. It is also possible that there are multiple values ​​on a slot. So this is why there are multiple and zero scans in the incremental process, as shown below
    [External link image transfer failed, the source site may have an anti-leech link mechanism, it is recommended to save the image and upload it directly (img-YrSOvyHH-1598970383675)(http://note.youdao.com/yws/res/20441/B47D5C36549948FE8A15FE8C9854E10C)]

  4. If it is convenient to follow the order of the array subscripts, what should we do if we expand, because after the expansion, we need to re-hash , and the position of the array subscripts will change. Then in this process, the cursor returned by scan before we expand is not accurate Yet?

  5. Redis's solution for the expansion of the table below is : instead of traversing from bit 0 of the first-dimensional array to the end, it uses
    high-order carry addition to traverse. The reason for using such a special method for traversal is to
    avoid duplication and omission of slot traversal when considering the expansion and shrinkage of the dictionary .

  6. The main reason for choosing high-bit carry addition is its expansion characteristics. It is similar to hashMap. It uses :
    *HashMap in Java has the concept of expansion. When loadFactor reaches the threshold, it needs to reallocate a new
    array of twice the size. , And then rehash all the elements to the new array. Rehash is the
    modulo operation of the hash value of the element to the length of the array. Because the length has changed, the slot where each element is attached may also have changed
    . And because the length of the array is 2 n ( the reason why its capacity is 2 n when we expand ), the modulo operation is equivalent to the bit AND operation.
    Insert picture description here

In abstract terms, assuming that the binary number of the starting slot is xxx, then the elements in the slot will be rehashed to
0xxx and 1xxx (xxx+8). If the length of the dictionary is expanded from 16 bits to 32 bits,
the elements in the binary slot xxxx will be rehashed to 0xxxx and 1xxxx (xxxx+16). *

a mod 8 = a & (8-1) = a & 7
a mod 16 = a & (16-1) = a & 15
a mod 32 = a & (32-1) = a & 31

The following figure is a comparison of shrinking and expanding
Insert picture description here

So redis uses high-order carry addition to traverse.

return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); Java 1.8计算hash的方法
  1. In addition, during the expansion, copy and rehash will be performed, and the amount of data in redis will be large, so a one-time rehash will cause a stall problem. So redis uses progressive rehash. In other words, the rehash is not performed at one time, but is continued slowly, so this is also something to pay attention to during the scan process, that is, both the new and old dicts are scanned, and finally merged and returned.
  2. We can think about the keys again. Doesn't we need to consider the above issues? , Because he always scans the full amount every time, so he doesn’t worry about expansion.

to sum up

  1. Use of redis scan and keys
  2. Scan internal scan introduction
  3. The basic data structure of dict and the general process of expansion

reference

"
Redis Deep Adventure" reids command reference: http://doc.redisfans.com/key/scan.html#scan

Guess you like

Origin blog.csdn.net/weixin_40413961/article/details/108352131