Redis in the end is how to achieve "close to the people," this feature do?

About the Author

Wan Mi, hungry Mody, senior development engineer. iOS, Go, Java are covered. At present the main big data development. Like riding, hiking.

Introduction: For application scenarios, "people in the vicinity," the position in the field of services, common use spatial index PG, MySQL and MongoDB and other DB will be achieved. The Redis another way, combined with its orderly queue zset and geohash coding to achieve a spatial search functions, and has a high operating efficiency. This article from the perspective of the source code to parse its algorithm theory, and calculate the time complexity of the query.

To provide a complete "people in the vicinity" service, it is essential to achieve the "increase", "delete", "search" function. The following will be introduced separately, which will focus on the search function to parse.

Operation Command

Since the beginning of Redis 3.2, Redis based geohash and ordered collection provides location-related functions. Redis Geo module 6 contains the following commands:

  • GEOADD : given target position (latitude, longitude, name) to the specified Key;
  • GEOPOS: Returns all positions of the object a given position (latitude and longitude) from the inside key;
  • GEODIST: Returns the distance between two given location;
  • GEOHASH: return one or more positions of objects represented Geohash;
  • GeoRadius : a given latitude and longitude as the center, and the set-back target distance from the center positions of all objects not exceeding a given maximum distance;
  • GEORADIUSBYMEMBER: given the center position of the object, the distance does not exceed its return to the position of all objects in a given maximum distance.

Wherein the combination may be implemented using GEOADD GEORADIUS and "close" and the "increase" and "check" basic functions. To achieve micro letter to "people in the vicinity" feature can be used directly GEORADIUSBYMEMBER command. The "given the position of the object" is the user himself, the object of the search for other users. Essentially, however, GEORADIUSBYMEMBER = GEOPOS + GEORADIUS, i.e. to find the location of the user and then the user object to meet other conditions by the position of the mutual distance near the position search.
The following will start from the point of view of the source and GEORADIUS GEOADD command analysis, analysis of algorithm theory.

Redis geo operation contains a "growth" and "search" operation only, and there is no specific "Delete" command. Mainly because of internal use Redis ordered set (zset) save the position of the object can be used zrem be deleted.

In the source file comments Redis geo.c, only the description of the file is GEOADD, GEORADIUS and GEORADIUSBYMEMBER implementation file (in fact, also achieved another three commands). From the other side surface three command assist command.

GEOADD

Use

GEOADD key longitude latitude member [longitude latitude member ...]
复制代码

Given the position of the object (latitude, longitude, name) to the specified key.
Wherein, key is set name, member for the object corresponding to the latitude and longitude. In practice, when the number of objects required to store too, may be provided by a multi-key (such as a province a key) is a collection of objects disguised manner Sharding done, to avoid excessive collection of a single number.

After successfully inserted the return value:

(integer) N
复制代码

Where N is the number of successfully inserted.

Source code analysis

/* GEOADD key long lat name [long2 lat2 name2 ... longN latN nameN] */
void geoaddCommand(client *c) {

//参数校验
    /* Check arguments number for sanity. */
    if ((c->argc - 2) % 3 != 0) {
        /* Need an odd number of arguments if we got this far... */
        addReplyError(c, "syntax error. Try GEOADD key [x1] [y1] [name1] "
                         "[x2] [y2] [name2] ... ");
        return;
    }

//参数提取Redis
    int elements = (c->argc - 2) / 3;
    int argc = 2+elements*2; /* ZADD key score ele ... */
    robj **argv = zcalloc(argc*sizeof(robj*));
    argv[0] = createRawStringObject("zadd",4);
    argv[1] = c->argv[1]; /* key */
    incrRefCount(argv[1]);

//参数遍历+转换
    /* Create the argument vector to call ZADD in order to add all
     * the score,value pairs to the requested zset, where score is actually
     * an encoded version of lat,long. */
    int i;
    for (i = 0; i < elements; i++) {
        double xy[2];

    //提取经纬度
        if (extractLongLatOrReply(c, (c->argv+2)+(i*3),xy) == C_ERR) {
            for (i = 0; i < argc; i++)
                if (argv[i]) decrRefCount(argv[i]);
            zfree(argv);
            return;
        }
    
    //将经纬度转换为52位的geohash作为分值 & 提取对象名称
        /* Turn the coordinates into the score of the element. */
        GeoHashBits hash;
        geohashEncodeWGS84(xy[0], xy[1], GEO_STEP_MAX, &hash);
        GeoHashFix52Bits bits = geohashAlign52Bits(hash);
        robj *score = createObject(OBJ_STRING, sdsfromlonglong(bits));
        robj *val = c->argv[2 + i * 3 + 2];

    //设置有序集合的对象元素名称和分值
        argv[2+i*2] = score;
        argv[3+i*2] = val;
        incrRefCount(val);
    }

//调用zadd命令,存储转化好的对象
    /* Finally call ZADD that will do the work for us. */
    replaceClientCommandVector(c,argc,argv);
    zaddCommand(c);
}
复制代码

As can be seen by the source code analysis using Internal Redis ordered set (zset) objects stored position, score objects ordered set each element is a tape position, the element is 52 bits latitude and longitude corresponding to the value of its geohash.

52 double precision type;
geohash is the base32 encoded, 52bits geohash maximum value storage 10, a geographic area corresponding to the size of 0.6 * 0.6 m grid. In other words converted by Redis geo location theoretically about 0.3 * 1.414 = 0.424 m error.

Algorithms Summary

Brief summary of the commands did what GEOADD:
1, and calibration parameter extraction;
2, the latitude and longitude is converted to the reference position 52 geohash value (score);
. 3, the stored set of command calls ZADD member and its corresponding score key in.

GEORADIUS

Use

GEORADIUS key longitude latitude radius m|km|ft|mi [WITHCOORD] [WITHDIST] [WITHHASH] [ASC|DESC] [COUNT count] [STORE key] [STORedisT key]
复制代码

A given latitude and longitude as the center, and the return from the target set in the center position of the object does not exceed a given maximum distance of all of the.
Range Unit: m | km | ft | mi -> m | one thousand m | ft | miles
extra parameters:
- WITHDIST: while the return position of the object, the distance between the center position of the object and is also returned together. User distance units and given range consistent units.
- WITHCOORD: the latitude and longitude position of an object together also return.
- WITHHASH: 52 in the form of a signed integer, returns the original position of the object through the ordered set of encoded geohash score. This option is mainly used for debugging or underlying application, the role of practice is not significant.
- ASC | DESC: Returns the position of the object away from the elements near to the | position near the element to return from far away to. - COUNT count: Select the previous position of the object N matching elements. (Not set all the elements of the return) - STORE key: Save the return results to a specific geographic location information key. - STORedisT key: to return to save the results to the specified distance from the center of the key.

Because STORE and STORedisT two options exist, GEORADIUS technically and GEORADIUSBYMEMBER command will be marked as a write command, so that only the query (write) the primary instance, when the QPS too easily lead to the main examples of reads too much pressure. To solve this problem, in Redis 3.2.10 and Redis 4.0.0, were added GEORADIUS_RO and GEORADIUSBYMEMBER_RO two read-only command.
However, I found that the actual development Package Penalty for the Java Redis.clients.jedis.params.geoGeoRadiusParam parameter class does not contain STORE STORedisT two parameters and options, whether you call georadius really only query the primary instance, or made read-only packages. Interested friends can study at their own.

After a successful lookup return value:
without WITH defined, a return member list, such as:

["member1","member2","member3"]
复制代码

WITH band defined, member list of each member is a nested list, such as:

[
	["member1", distance1, [longitude1, latitude1]]
	["member2", distance2, [longitude2, latitude2]]
]

复制代码

Source code analysis

Source paragraph long, could not stand to see direct Chinese notes, or jump directly to the summary section

/* GEORADIUS key x y radius unit [WITHDIST] [WITHHASH] [WITHCOORD] [ASC|DESC]
 *                               [COUNT count] [STORE key] [STORedisT key]
 * GEORADIUSBYMEMBER key member radius unit ... options ... */
void georadiusGeneric(client *c, int flags) {
    robj *key = c->argv[1];
    robj *storekey = NULL;
    int stoRedist = 0; /* 0 for STORE, 1 for STORedisT. */

//根据key获取有序集合
    robj *zobj = NULL;
    if ((zobj = lookupKeyReadOrReply(c, key, shared.null[c->resp])) == NULL ||
        checkType(c, zobj, OBJ_ZSET)) {
        return;
    }

//根据用户输入(经纬度/member)确认中心点经纬度
    int base_args;
    double xy[2] = { 0 };
    if (flags & RADIUS_COORDS) {
		……
    }

//获取查询范围距离
    double radius_meters = 0, conversion = 1;
    if ((radius_meters = extractDistanceOrReply(c, c->argv + base_args - 2,
                                                &conversion)) < 0) {
        return;
    }

//获取可选参数 (withdist、withhash、withcoords、sort、count)
    int withdist = 0, withhash = 0, withcoords = 0;
    int sort = SORT_NONE;
    long long count = 0;
    if (c->argc > base_args) {
        ... ...
    }

//获取 STORE 和 STORedisT 参数
    if (storekey && (withdist || withhash || withcoords)) {
        addReplyError(c,
            "STORE option in GEORADIUS is not compatible with "
            "WITHDIST, WITHHASH and WITHCOORDS options");
        return;
    }

//设定排序
    if (count != 0 && sort == SORT_NONE) sort = SORT_ASC;

//利用中心点和半径计算目标区域范围
    GeoHashRadius georadius =
        geohashGetAreasByRadiusWGS84(xy[0], xy[1], radius_meters);

//对中心点及其周围8个geohash网格区域进行查找,找出范围内元素对象
    geoArray *ga = geoArrayCreate();
    membersOfAllNeighbors(zobj, georadius, xy[0], xy[1], radius_meters, ga);

//未匹配返空
    /* If no matching results, the user gets an empty reply. */
    if (ga->used == 0 && storekey == NULL) {
        addReplyNull(c);
        geoArrayFree(ga);
        return;
    }

//一些返回值的设定和返回
    ……
    geoArrayFree(ga);
}

复制代码

The core of the code above two steps, first, "Computing Center Point" and the second is "on and around the center point of the grid area 8 geohash to find." Corresponds geohashGetAreasByRadiusWGS84and membersOfAllNeighborstwo functions. We turn to see:

  • Computing Center Point Range:

// geohash_helper.c

GeoHashRadius geohashGetAreasByRadiusWGS84(double longitude, double latitude,
                                           double radius_meters) {
    return geohashGetAreasByRadius(longitude, latitude, radius_meters);
}

//返回能够覆盖目标区域范围的9个geohashBox
GeoHashRadius geohashGetAreasByRadius(double longitude, double latitude, double radius_meters) {
//一些参数设置
    GeoHashRange long_range, lat_range;
    GeoHashRadius radius;
    GeoHashBits hash;
    GeoHashNeighbors neighbors;
    GeoHashArea area;
    double min_lon, max_lon, min_lat, max_lat;
    double bounds[4];
    int steps;

//计算目标区域外接矩形的经纬度范围(目标区域为:以目标经纬度为中心,半径为指定距离的圆)
    geohashBoundingBox(longitude, latitude, radius_meters, bounds);
    min_lon = bounds[0];
    min_lat = bounds[1];
    max_lon = bounds[2];
    max_lat = bounds[3];

//根据目标区域中心点纬度和半径,计算带查询的9个搜索框的geohash精度(位)
//这里用到latitude主要是针对极地的情况对精度进行了一些调整(纬度越高,位数越小)
    steps = geohashEstimateStepsByRadius(radius_meters,latitude);

//设置经纬度最大最小值:-180<=longitude<=180, -85<=latitude<=85
    geohashGetCoordRange(&long_range,&lat_range);
    
//将待查经纬度按指定精度(steps)编码成geohash值
    geohashEncode(&long_range,&lat_range,longitude,latitude,steps,&hash);
    
//将geohash值在8个方向上进行扩充,确定周围8个Box(neighbors)
    geohashNeighbors(&hash,&neighbors);
    
//根据hash值确定area经纬度范围
    geohashDecode(long_range,lat_range,hash,&area);

//一些特殊情况处理
    ……

//构建并返回结果    
    radius.hash = hash;
    radius.neighbors = neighbors;
    radius.area = area;
    return radius;
}

复制代码
  • And around the center point of the grid area 8 geohash to find:

// geo.c

//在9个hashBox中获取想要的元素
int membersOfAllNeighbors(robj *zobj, GeoHashRadius n, double lon, double lat, double radius, geoArray *ga) {
    GeoHashBits neighbors[9];
    unsigned int i, count = 0, last_processed = 0;
    int debugmsg = 0;

//获取9个搜索hashBox
    neighbors[0] = n.hash;
    ……
    neighbors[8] = n.neighbors.south_west;

//在每个hashBox中搜索目标点
    for (i = 0; i < sizeof(neighbors) / sizeof(*neighbors); i++) {
        if (HASHISZERO(neighbors[i])) {
            if (debugmsg) D("neighbors[%d] is zero",i);
            continue;
        }

	//剔除可能的重复hashBox (搜索半径>5000KM时可能出现)
        if (last_processed &&
            neighbors[i].bits == neighbors[last_processed].bits &&
            neighbors[i].step == neighbors[last_processed].step)
        {
            continue;
        }

	//搜索hashBox中满足条件的对象    
        count += membersOfGeoHashBox(zobj, neighbors[i], ga, lon, lat, radius);
        last_processed = i;
    }
    return count;
}


int membersOfGeoHashBox(robj *zobj, GeoHashBits hash, geoArray *ga, double lon, double lat, double radius) {
//获取hashBox内的最大、最小geohash值(52位)
    GeoHashFix52Bits min, max;
    scoresOfGeoHashBox(hash,&min,&max);

//根据最大、最小geohash值筛选zobj集合中满足条件的点
    return geoGetPointsInRange(zobj, min, max, lon, lat, radius, ga);
}


int geoGetPointsInRange(robj *zobj, double min, double max, double lon, double lat, double radius, geoArray *ga) {

//搜索Range的参数边界设置(即9个hashBox其中一个的边界范围)
    zrangespec range = { .min = min, .max = max, .minex = 0, .maxex = 1 };
    size_t origincount = ga->used;
    sds member;

//搜索集合zobj可能有ZIPLIST和SKIPLIST两种编码方式,这里以SKIPLIST为例,逻辑是一样的
    if (zobj->encoding == OBJ_ENCODING_ZIPLIST) {
        ……
    } else if (zobj->encoding == OBJ_ENCODING_SKIPLIST) {
        zset *zs = zobj->ptr;
        zskiplist *zsl = zs->zsl;
        zskiplistNode *ln;

	//获取在hashBox范围内的首个元素(跳表数据结构,效率可比拟于二叉查找树),没有则返0
        if ((ln = zslFirstInRange(zsl, &range)) == NULL) {
            /* Nothing exists starting at our min.  No results. */
            return 0;
        }

	//从首个元素开始遍历集合
        while (ln) {
            sds ele = ln->ele;
		//遍历元素超出range范围则break
            /* Abort when the node is no longer in range. */
            if (!zslValueLteMax(ln->score, &range))
                break;
		//元素校验(计算元素与中心点的距离)
            ele = sdsdup(ele);
            if (geoAppendIfWithinRadius(ga,lon,lat,radius,ln->score,ele)
                == C_ERR) sdsfree(ele);
            ln = ln->level[0].forward;
        }
    }
    return ga->used - origincount;
}

int geoAppendIfWithinRadius(geoArray *ga, double lon, double lat, double radius, double score, sds member) {
    double distance, xy[2];

//解码错误, 返回error
    if (!decodeGeohash(score,xy)) return C_ERR; /* Can't decode. */

//最终距离校验(计算球面距离distance看是否小于radius)
    if (!geohashGetDistanceIfInRadiusWGS84(lon,lat, xy[0], xy[1],
                                           radius, &distance))
    {
        return C_ERR;
    }

//构建并返回满足条件的元素
    geoPoint *gp = geoArrayAppend(ga);
    gp->longitude = xy[0];
    gp->latitude = xy[1];
    gp->dist = distance;
    gp->member = member;
    gp->score = score;
    return C_OK;
}


复制代码

Algorithms Summary

Despite a number of optional parameters aside, a brief summary of how the use of the command GEORADIUS obtains the target position of the object geohash:
1, and calibration parameter extraction;
2, using the center point and the radius of the input area to be inspected is calculated range. This range satisfies the condition parameter comprises a grid geohash highest level (precision), and able to cover the target area corresponding to the position squared; (detailed description will follow)
3, of squares traversed, the bounding box of each grid according geohash elected position object. Further find the center point of the object distance is less than the radius of the input, for return.

Direct description is not very good understanding of our algorithms in a simple demonstration through the following two graphs:

georadius
georadius

Left of center to make the search center, green circular area as a target area, all the points for the position of the object to be searched, red dot position of the object was to meet the conditions.
When the actual search, the search radius is first calculated geohash grid level (i.e., the right mesh size level), and determines the position of the squares (i.e. the location information of red squares); in turn squared calculation to find a point (blue dot and red dots) and the distance from the center point, were finally selected point (red dots) within a range of distances.

Analysis of Algorithms

Why use this algorithm strategy queries, or where the advantages of this strategy, let us analyze questions and answers explain.

  • Why should we find to meet the conditions of the highest geohash grid level? Why squared?

    This is actually a problem, essentially all of the elements of an object conducted a preliminary screening. In the multilayer geohash grid, each grid geohash low level is formed by combining a high mesh 4 together (FIG).

    georadius

    In other words, the higher the level of geohash mesh, smaller geographic range covered. When we calculate the radius and center point based on the input position of the highest level to cover the target area squared (grid), the elements already been squared outer screening. The reason here to use squares instead of a single grid, the main reason is to avoid the border, the region-wide narrow your search as much as possible. Imagine 0 latitude and longitude as the center, even if check 1 meter, covered by a single grid also have to check the entire earth. The extension to the surrounding circle of eight directions can effectively avoid this problem.

  • How to select objects by geohash element mesh in box? How effective?
    Geohash geohash first value at each grid are continuous, have a fixed range. So as long to find an ordered set, in the position of the object to the range. The following is an ordered set of jump table data structure:

    georadius
    It has a similar query efficiency of a binary search tree, the average operating time complexity is O (log (N)). And the bottom of all the elements are sequentially arranged in the form of a linked list. So in the query, as long as found in the collection at the first target value geohash grid, followed by subsequent comparison can be, do multiple lookups. Can not check the squares together, to be a reason that the traversal of each grid squares corresponding to a value geohash without continuity. Only in a row, and query efficiency will be high, or to do a lot of distance calculation.

In summary, we resolved Redis Geo module in detail the process of "increase (GEOADD)" and "Charlie (GEORADIUS)" from the source point of view. And calculate the GEORADIUS Redis find nearby people function, time complexity is: O (N + log (M)), where N is the number of elements within a specified radius position, and M is the calculated distance squared circled the number of elements. Redis binding characteristics based storage memory itself, there is a very high operating efficiency in actual use.

Reference

Redis Command Reference
geohash
Redis data structures in ZSET skiplist





Read the blog is not fun?

Welcome to sweep the two-dimensional code by adding a group assistant, joined the exchange group, and discuss technical issues related to the blog, and bloggers can have more interactive

Blog reproduced, activities and cooperation issues please e-mail to communicate with the line [email protected]

Guess you like

Origin juejin.im/post/5da40462f265da5baf410a11