Set of Redis data structure

The Set type is an unordered and unique key-value collection, and its storage order will not be stored in the order of insertion. Collections in Redis are implemented through hash tables, so the complexity of adding, deleting, and searching is O(1). Compared with lists, collections also have two characteristics: unordered and non-repeatable

A set can store at most 2^32-1elements. The concept is basically similar to the set of individuals in mathematics. The concept of a mathematical set refers to a collection of concrete or abstract objects with certain properties.

In short, Redis collections are combinations of unique values. Using the data structure of set (Set), Redis can store some set-type data. Redis also supports basic operations of sets such as intersection, union, and difference through some simple commands.


1. Set data type

1.1. Introduction to Set type

The Set type is an unordered and unique key-value collection, and its storage order will not be stored in the order of insertion. Collections in Redis are implemented through hash tables, so the complexity of adding, deleting, and searching is O(1). Compared with lists, collections also have two characteristics: unordered and non-repeatable

A set can store at most 2^32-1elements. The concept is basically similar to the set of individuals in mathematics. The concept of a mathematical set refers to a collection of concrete or abstract objects with certain properties.

In short, Redis collections are combinations of unique values. Using the data structure of set (Set), Redis can store some set-type data. Redis also supports basic operations of sets such as intersection, union, and difference through some simple commands.

1.2. Set application scenarios

Common application scenarios include: voting system, labeling system, mutual friends, common attention, common hobbies, lottery, product screening column, access IP statistics, etc.

scenes to be used:

  • Like, dislike, favorite: Set type can guarantee that a user can only like one;
  • Mutual attention, tags: Set type supports intersection operation, so it can be used to calculate mutual attention of friends, official accounts, etc.;
  • Sweepstakes: Store the usernames of the winning users in a certain activity. Because the Set type has the function of deduplication, it can ensure that the same user will not win the prize twice

2. Set underlying structure

2.1. Introduction to the underlying structure of List

The underlying storage of Redis Set adopts integer set IntSet and hash table, and the two are mutually converted. The following two conditions must be met for using IntSet storage, otherwise, HashTable is used, and the conditions are as follows:

  • All elements held by the binding object are integer values;
  • The number of elements saved by the collection object does not exceed 512

Taking the SADD command of Set as an example, the whole adding process is as follows:

  • Check if the Set exists or not, create a Set combination.
  • Add one by one according to the incoming Set collection, and memory compression is required when adding.
  • setTypeAdd will determine whether to perform encoding conversion during the Set adding process
void saddCommand(redisClient *c) {
    robj *set;
    int j, added = 0;
 
    // 取出集合对象
    set = lookupKeyWrite(c->db,c->argv[1]);
 
    // 对象不存在,创建一个新的,并将它关联到数据库
    if (set == NULL) {
        set = setTypeCreate(c->argv[2]);
        dbAdd(c->db,c->argv[1],set);
 
    // 对象存在,检查类型
    } else {
        if (set->type != REDIS_SET) {
            addReply(c,shared.wrongtypeerr);
            return;
        }
    }
 
    // 将所有输入元素添加到集合中
    for (j = 2; j < c->argc; j++) {
        c->argv[j] = tryObjectEncoding(c->argv[j]);
        // 只有元素未存在于集合时,才算一次成功添加
        if (setTypeAdd(set,c->argv[j])) added++;
    }
 
    // 如果有至少一个元素被成功添加,那么执行以下程序
    if (added) {
        // 发送键修改信号
        signalModifiedKey(c->db,c->argv[1]);
        // 发送事件通知
        notifyKeyspaceEvent(REDIS_NOTIFY_SET,"sadd",c->argv[1],c->db->id);
    }
 
    // 将数据库设为脏
    server.dirty += added;
 
    // 返回添加元素的数量
    addReplyLongLong(c,added);
}

A little in-depth analysis of the adding process of a single element of the set, first of all, if it is already a HashTable code, then we will add the normal HashTable element, if it turns out to be an IntSet, then we need to make the following judgment:

  • If it can be converted into an int object (isObjectRepresentableAsLongLong), then use IntSet to save it.
  • If it is saved with IntSet, if the length exceeds 5 12 (REDIS_SET_MAX_INTSET_ENTRIES), it will be converted to HashTable encoding.
  • In other cases, HashTable is used for storage.
2.2. Integer set IntSet

The integer set IntSet is a data structure used by Redis to store the set of integer values. It can be used to store int type data, and it can guarantee that no duplicate elements will appear. Therefore, when a set contains only integer elements and the number is not large, Redis will choose to use the integer set as the underlying implementation.

The inside of IntSet is actually an array (int8_t coentents[] array), and the data is stored in order, because the search for data is achieved by binary search.

img

If your collection has only integer-valued elements, and the number is lightweight, then Redis will use the integer collection as the underlying data structure of the Redis collection. Refer to the following code:

typedef struct IntSet{
    
    
     // 编码格式
     uint32_t encoding;
     // 集合中的元素个数
     uint32_t length;
     // 保存元素数据
     int8_t contents[];
} IntSet;

Let's break it down:

Attributes illustrate
“encoding” Encoding
“length” The number of elements in the array, that is, the overall length of the array
“contents[]” A collection of integers, each element of which is an array item (item) of the array. Features: Arranged in ascending order of value, does not contain any duplicates

"contents" is the underlying implementation of the integer collection, which saves each element of the integer collection, and each element is arranged in order from small to large in the array, and does not repeat (how to ensure order and uniqueness, we will discuss insertion later time is talking). Although the "contents" array is declared as int8_t type, the actual type depends on the value of "encoding". When operating an integer set, the value of "encoding" will be obtained first.

For example, when we SADD numbers 1 3 5insert data into a collection object, the structure of the collection object in memory is as follows:

image-20230823235054892

2.3, hash table HashTable

The key-value in Redis is implemented through the dictEntry object, and the hash table is obtained by packaging the dictEntry again. This is the hash table object dicttht:

typedef struct dictht {
    dictEntry **table;//哈希表数组
    unsigned long size;//哈希表大小
    unsigned long sizemask;//掩码大小,用于计算索引值,总是等于size-1
    unsigned long used;//哈希表中的已有节点数
} dictht;

PS: table is an array, each element of which is a dictEntry object.

hashtableThe encoded collection object uses a dictionary as the underlying implementation. Each key of the dictionary is a string object, each string object corresponds to a collection element, and the value of the dictionary is NULL. When we execute SADD fruits "apple" "banana" "cherry"to insert data into the collection object, the memory structure of the collection object is as follows:

image-20230823235716144


3. Set common commands

3.1. Add collection elements

Use the SADD command to add collection elements

SADD set value

If the value already exists, do not add it and return 0

image-20230821235344528

3.2. View all values ​​of the collection

Use the SMEMBERS command to view all values ​​​​of the collection

SMEMBERS set

image-20230821235614136

3.3. Determine whether a value is in the set

Use the SISMEMBER command to determine whether a value is in the set

image-20230821235954832

3.4. View the number of stored values ​​in a collection

Use the SCARD command to view the number of stored values ​​in a collection

SCARD set

image-20230822000410786

3.5. Delete the element with the specified value in the collection

Use SREM to remove elements with specified values ​​from a collection

SREM set value

image-20230822000710429

3.6. Randomly select an element in a set

Use the SRANDMEMBER command to randomly select an element in a collection

SRANDMEMBER set

image-20230822000949807

3.7. Randomly delete an element in a collection

Use the SPOP command to randomly delete an element in a collection

SPOP set

image-20230822001227634

3.8. Move a value in one set to another set

Use the SMOVE command to move a value from one set to another

SMOVE source target value

image-20230822001457709

3.9. Set operation: difference set

Set operations using the SDIFF command: difference

SDIFF set1 set2

image-20230822001906994

3.10. Set operation: intersection

Set Operations Using the SINTER Command: Intersection

SINTER set1 set2

image-20230822002039149

3.11. Set operation: union

Use the SUNION command for set operations: union

SUNION set1 set2

image-20230822001939037

Guess you like

Origin blog.csdn.net/weixin_45187434/article/details/132463207