Redis from entry to abandonment series (4) Set

Redis from entry to abandonment series (4) Set

The example in this article is based on: 5.0.4 Set is a relatively common data structure in Redis. When the stored member is a string of integers in the range of decimal 64-bit signed integers, it is implemented as intset, and the others are hashtable.

Redis from entry to abandonment series (3) List

First let's see how to use the Set type in redis

//设置key的集合中的值为member
sadd key member [member ...]

Code example:

> sadd books java python c
(integer) 3
//当我们重复添加相同的数据的时候,redis返回为0
> sadd books java python c
(integer) 0
----------------------------------
//返回books集合的所有元素
>smembers books 
1) "c"
2) "python"
3) "java"
----------------------------------
//判断某个元素是否在集合里面
>sismember books c
(integer) 1
>sismember books 99
(integer) 0
----------------------------------
//两个集合的交集
> sadd new_books java c++ R
(integer) 3
> SINTER books new_books
1) "java"
----------------------------------
//两个集合的并集
> SUNION books new_books
1) "java"
2) "python"
3) "c"
4) "c++"
5) "R"
//两个集合的差集
> SMEMBERS books
1) "c"
2) "python"
3) "java"
> SMEMBERS new_books
1) "R"
2) "c++"
3) "java"
>  SDIFF books new_books
1) "python"
2) "c"
>  SDIFF new_books books
1) "R"
2) "c++"

So far, the usage of redis set has come to an end.


Source code analysis

At the beginning of this article, the set implementation is divided into intset and hashtable. If you explain hashtable, you can go back and look at Redis from entry to abandonment series (2) Hash This section focuses on intset~ When the storage element is an integer, redis In order to save space, the data structure of intset is used for storage. We know that the set structure is unordered when storing strings, but when using intset to store integers, the set is ordered, and the internal use of Dichotomy is convenient for quick query. Let's take a look at the internal structure of intset first.

typedef struct intset {
    uint32_t encoding;
    uint32_t length;
    int8_t contents[];
} intset;

We found that intset is actually represented by a variable type and a length, that is to say, to calculate the bytes occupied by the current intset: encoding * length; When redis uses intset, it will first determine the size of the currently inserted value, and then Returns the type of different bytes

/* Note that these encodings are ordered, so:
 * INTSET_ENC_INT16 < INTSET_ENC_INT32 < INTSET_ENC_INT64. */
#define INTSET_ENC_INT16 (sizeof(int16_t))
#define INTSET_ENC_INT32 (sizeof(int32_t))
#define INTSET_ENC_INT64 (sizeof(int64_t))

/* Return the required encoding for the provided value. */
static uint8_t _intsetValueEncoding(int64_t v) {
    if (v < INT32_MIN || v > INT32_MAX)
        return INTSET_ENC_INT64;
    else if (v < INT16_MIN || v > INT16_MAX)
        return INTSET_ENC_INT32;
    else
        return INTSET_ENC_INT16;
}

When the value of each inserted value is greater than the current type, redis will upgrade the intset to a larger encoding

/* Upgrades the intset to a larger encoding and inserts the given integer. */
static intset *intsetUpgradeAndAdd(intset *is, int64_t value) {
    uint8_t curenc = intrev32ifbe(is->encoding);
    uint8_t newenc = _intsetValueEncoding(value);
    int length = intrev32ifbe(is->length);
    int prepend = value < 0 ? 1 : 0;

    /* First set new encoding and resize */
    is->encoding = intrev32ifbe(newenc);
    is = intsetResize(is,intrev32ifbe(is->length)+1);

    /* Upgrade back-to-front so we don't overwrite values.
     * Note that the "prepend" variable is used to make sure we have an empty
     * space at either the beginning or the end of the intset. */
    while(length--)
        _intsetSet(is,length+prepend,_intsetGetEncoded(is,length,curenc));

    /* Set the value at the beginning or the end. */
    if (prepend)
        _intsetSet(is,0,value);
    else
        _intsetSet(is,intrev32ifbe(is->length),value);
    is->length = intrev32ifbe(intrev32ifbe(is->length)+1);
    return is;
}

As we said earlier, intset is an ordered, and then the binary method is used to find elements when searching, so how is it implemented internally?

/* Search for the position of "value". Return 1 when the value was found and
 * sets "pos" to the position of the value within the intset. Return 0 when
 * the value is not present in the intset and sets "pos" to the position
 * where "value" can be inserted. */
static uint8_t intsetSearch(intset *is, int64_t value, uint32_t *pos) {
    int min = 0, max = intrev32ifbe(is->length)-1, mid = -1;
    int64_t cur = -1;

    /* The value can never be found when the set is empty */
    if (intrev32ifbe(is->length) == 0) {
        if (pos) *pos = 0;
        return 0;
    } else {
        /* Check for the case where we know we cannot find the value,
         * but do know the insert position. */
        if (value > _intsetGet(is,max)) {
            if (pos) *pos = intrev32ifbe(is->length);
            return 0;
        } else if (value < _intsetGet(is,0)) {
            if (pos) *pos = 0;
            return 0;
        }
    }

    while(max >= min) {
        mid = ((unsigned int)min + (unsigned int)max) >> 1;
        cur = _intsetGet(is,mid);
        if (value > cur) {
            min = mid+1;
        } else if (value < cur) {
            max = mid-1;
        } else {
            break;
        }
    }

    if (value == cur) {
        if (pos) *pos = mid;
        return 1;
    } else {
        if (pos) *pos = min;
        return 0;
    }
}
/* Insert an integer in the intset */
intset *intsetAdd(intset *is, int64_t value, uint8_t *success) {
    uint8_t valenc = _intsetValueEncoding(value);
    uint32_t pos;
    if (success) *success = 1;

    /* Upgrade encoding if necessary. If we need to upgrade, we know that
     * this value should be either appended (if > 0) or prepended (if < 0),
     * because it lies outside the range of existing values. */
    if (valenc > intrev32ifbe(is->encoding)) {
        /* This always succeeds, so we don't need to curry *success. */
        return intsetUpgradeAndAdd(is,value);
    } else {
        /* Abort if the value is already present in the set.
         * This call will populate "pos" with the right position to insert
         * the value when it cannot be found. */
        if (intsetSearch(is,value,&pos)) {
            if (success) *success = 0;
            return is;
        }

        is = intsetResize(is,intrev32ifbe(is->length)+1);
        if (pos < intrev32ifbe(is->length)) intsetMoveTail(is,pos,pos+1);
    }

    _intsetSet(is,pos,value);
    is->length = intrev32ifbe(intrev32ifbe(is->length)+1);
    return is;
}

The focus is on the while operation section ~ we can see that its search uses a dichotomy, so how to make it orderly?

if (intsetSearch(is,value,&pos)) {
    if (success) *success = 0;
    return is;
}

Did you see this part? When judging the search, the position of pos is found out, and the prelude to the following _intsetSet operation~

Application scenarios

1. De-duplication~ 2. Check the common hobbies of two people

write at the end

Happy 520 everyone~

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324198683&siteId=291194637