Hash table of Nginx data structure

1. Hash table (ie hash table concept)

A hash table is a data structure that is directly accessed based on the key value of the elements. That is, it accesses records by mapping the key value to a location in the table
to speed up lookups. This mapping function f is called a hash method, and the array in which the records are stored is called a hash table.

If there is a record with the same key and K in the structure, it must be in the storage location of f(K). Thus, the searched record can be obtained directly without comparison. We call this
correspondence f a hashing method, and a table built according to this idea is a hashing table.

For different keywords, the same hash address may be obtained, that is, the key code key1 ≠ key2, and f(key1) = f(key2), this phenomenon is called collision. For this hashing
method, keywords with the same function value are called synonyms. To sum up, according to the hash method H(key) and the method of handling collision, a set of keywords is mapped to
a limited continuous address set (interval), and the "image" of the keyword in the address set is used as the The storage location recorded in the table, such a table is called a hash table, this
mapping process is called hash table or hashing, and the resulting storage location is called a hash address.

1.1 How to solve the collision problem

There are usually two simple solutions: split linking and open addressing.

The separation link method is to put all the elements hashed into the same slot in a linked list outside the hash table, so that when querying an element, after finding the slot, it is necessary to
traverse the linked list to find the correct element. This solves the collision problem.

Open addressing method, that is, all elements are stored in the hash table, when looking for an element, all table entries in the rule are checked (for example, consecutive non-empty slots or all slots in the entire
space that match the hash method) , until the desired element is found, or the element is eventually found not in the table. There are no linked lists in open addressing, and no elements are stored
outside the hash table.

Nginx's hash table uses open addressing.
There are many ways to implement the open addressing method, and Nginx uses the method of storing colliding elements in consecutive non-empty slots. For example, when inserting an element, the specified slot can be found according to the hash
method . If the slot is not empty and the element stored in it is not the same element as the element to be inserted, then successive slots are checked in turn until an empty one is found. slot to
place this element. A similar method is used when querying elements, that is, elements in consecutive non-empty slots are checked from the position specified by the hashing method.

2. Implementation of Nginx hash table

2.1 ngx_hash_elt_t structure

For the elements in the hash table, Nginx uses the ngx_hash_elt_t structure to store them.

typedef struct {
    /* 指向用户自定义元素数据的指针,如果当前 ngx_hash_elt_t 槽为空,则 value 的值为 0 */
    void             *value;
    /* 元素关键字的长度 */
    u_short           len;
    /* 元素关键字的首地址 */
    u_char            name[1];
} ngx_hash_elt_t;

Each hash table slot is represented by an ngx_hash_elt_t structure. Of course, the size of this slot
is not equal to the size of the ngx_hash_elt_t structure (ie sizeof(ngx_hash_elt_t)), because the name member is only used to indicate the keyword The first address, and the length of the keyword is variable. How much space a slot
occupies is determined when the hash table is initialized.

2.2 ngx_hash_t structure

The basic hash table is represented by the ngx_hash_t structure.

typedef struct {
    /* 指向散列表的首地址,也是第 1 个槽的地址 */
    ngx_hash_elt_t  **buckets;
    /* 散列表中槽的总数 */
    ngx_uint_t        size;
} ngx_hash_t;

Therefore, the length of each slot (limiting the maximum length of each element key) and the space occupied by the entire hash table are determined when the bucket members are allocated
.

Schematic diagram of the structure of the basic hash table


As shown in the figure above, the first address of each slot in the hash table is an ngx_hash_elt_t structure, the value member points to a meaningful structure for the user, and len is
the effective length of the name (that is, the key of the element) in the current slot. The buckets of the ngx_hash_t hash table point to the starting address of the hash table, and size indicates
the total number of slots in the hash table.

2.3 ngx_hash_init_t structure

typedef struct {
    /* 指向普通的完全匹配散列表 */
    ngx_hash_t       *hash;
    
    /* 用于初始化添加元素的散列方法 */
    ngx_hash_key_pt   key;

    /* 散列表中槽的最大数目 */
    ngx_uint_t        max_size;
    /* 散列表中一个槽的大小,它限制了每个散列表元素关键字的最大长度 */
    ngx_uint_t        bucket_size;

    /* 散列表的名称 */
    char             *name;
    /* 内存池,用于分配散列表(最多3个,包括1个普通散列表、1个前置通配符散列表、1个后置通配符散列表)
     * 中的所有槽 */
    ngx_pool_t       *pool;
    /* 临时内存池,仅存在于初始化散列表之前。它主要用于分配一些临时的动态数组,
     * 带通配符的元素在初始化时需要用到这些数组 */
    ngx_pool_t       *temp_pool;
} ngx_hash_init_t;

This structure is used to initialize a hash table.

2.4 ngx_hash_key_t structure

typedef struct {
    /* 元素关键字 */
    ngx_str_t         key;
    /* 由散列方法算出来的关键码 */
    ngx_uint_t        key_hash;
    /* 指向实际的用户数据 */
    void             *value;
}ngx_hash_key_t;

2.3 ngx_hash_init(): Initialize a basic hash table

/* 计算该实际元素 name 所需的内存空间(有对齐处理),而 sizeof(void *) 就是结束哨兵的所需内存空间 */
#define NGX_HASH_ELT_SIZE(name)                                               \
    (sizeof(void *) + ngx_align((name)->key.len + 2, sizeof(void *)))

/*
 * @hinit:该指针指向的结构体中包含一些用于建立散列表的基本信息
 * @names:元素关键字数组,该数组中每个元素以ngx_hash_key_t作为结构体,存储着预添加到散列表中的元素
 * @nelts: 元素关键字数组中元素个数
 */
ngx_int_t ngx_hash_init(ngx_hash_init_t *hinit, ngx_hash_key_t *names, ngx_uint_t nelts)
{
    u_char          *elts;
    size_t           len;
    u_short         *test;
    ngx_uint_t       i, n, key, size, start, bucket_size;
    ngx_hash_elt_t  *elt, **buckets;

    if (hinit->max_size == 0) 
    {
        ngx_log_error(NGX_LOG_EMERG, hinit->pool->log, 0,
                      "could not build %s, you should "
                      "increase %s_max_size: %i",
                      hinit->name, hinit->name, hinit->max_size);
        return NGX_ERROR;
    }

    for (n = 0; n < nelts; n++) 
    {
        /* 这个判断是确保一个 bucket 至少能存放一个实际元素以及结束哨兵,如果有任意一个实际元素
         * (比如其 name 字段特别长)无法存放到 bucket 内则报错返回 */
        if (hinit->bucket_size < NGX_HASH_ELT_SIZE(&names[n]) + sizeof(void *))
        {
            ngx_log_error(NGX_LOG_EMERG, hinit->pool->log, 0,
                          "could not build %s, you should "
                          "increase %s_bucket_size: %i",
                          hinit->name, hinit->name, hinit->bucket_size);
            return NGX_ERROR;
        }
    }

    /* 接下来的测试针对当前传入的所有实际元素,测试分配多少个 Hash 节点(也就是多少个 bucket)会比较好,
     * 即能省内存又能少冲突,否则的话,直接把 Hash 节点数目设置为最大值 hinit->max_size 即可。 */
     
    test = ngx_alloc(hinit->max_size * sizeof(u_short), hinit->pool->log);
    if (test == NULL) 
    {
        return NGX_ERROR;
    }

    /* 计算一个 bucket 除去结束哨兵所占空间后的实际可用空间大小 */
    bucket_size = hinit->bucket_size - sizeof(void *);

    /* 计算所需 bucket 的最小个数,注意到存储一个实际元素所需的内存空间的最小值也就是
     * (2*sizeof(void *)) (即宏 NGX_HASH_ELT_SIZE 的对齐处理),所以一个 bucket 可以存储
     * 的最大实际元素个数就为 bucket_size / (2 * sizeof(void *)),然后总实际元素个数 nelts
     * 除以这个值就是最少所需要的 bucket 个数 */
    start = nelts / (bucket_size / (2 * sizeof(void *)));
    start = start ? start : 1;

    /* 如果这个 if 条件成立,意味着实际元素个数非常多,那么有必要直接把 start 起始值调高,否则在后面的  
     * 循环里要执行过多的无用测试 */
    if (hinit->max_size > 10000 && nelts && hinit->max_size / nelts < 100) 
    {
        start = hinit->max_size - 1000;
    }

    /* 下面的 for 循环就是获取 Hash 结构最终节点数目的逻辑。就是逐步增加 Hash 节点数目(那么对应的
     *  bucket 数目同步增加),然后把所有的实际元素往这些 bucket 里添放,这有可能发生冲突,但只要
     * 冲突的次数可以容忍,即任意一个 bucket 都还没满,那么就继续填,如果发生有任何一个 bucket 
     * 满溢了(test[key] 记录了 key 这个 hash 节点所对应的 bucket 内存储实际元素后的总大小,如果它大
     * 于一个 bucket 可用的最大空间 bucket_size,自然就是满溢了),那么就必须增加 Hash 节点、增加 
     * bucket。如果所有实际元素都填完后没有发生满溢,那么当前的 size 值就是最终的节点数目值 */
    for (size = start; size <= hinit->max_size; size++) 
    {

        ngx_memzero(test, size * sizeof(u_short));

        for (n = 0; n < nelts; n++) 
        {
            if (names[n].key.data == NULL) 
            {
                continue;
            }

            key = names[n].key_hash % size;
            test[key] = (u_short) (test[key] + NGX_HASH_ELT_SIZE(&names[n]));

#if 0
            ngx_log_error(NGX_LOG_ALERT, hinit->pool->log, 0,
                          "%ui: %ui %ui \"%V\"",
                          size, key, test[key], &names[n].key);
#endif

            /* 判断是否满溢,若满溢,则必须增加 Hash 节点、增加 bucket */
            if (test[key] > (u_short) bucket_size) 
            {
                goto next;
            }
        }

        /* 这里表示已将所有元素都添放到 bucket 中,则此时的 size 即为所需的节点数目值 */
        goto found;

    next:

        continue;
    }

    size = hinit->max_size;

    ngx_log_error(NGX_LOG_WARN, hinit->pool->log, 0,
                  "could not build optimal %s, you should increase "
                  "either %s_max_size: %i or %s_bucket_size: %i; "
                  "ignoring %s_bucket_size",
                  hinit->name, hinit->name, hinit->max_size,
                  hinit->name, hinit->bucket_size, hinit->name);

found:

    /* 找到需创建的 Hash 节点数目值,接下来就是实际的 Hash 结构创建工作。
     * 注意:所有 buckets 所占的内存空间是连接在一起的,并且是按需分配(即某个 bucket 需多少内存
     * 存储实际元素就分配多少内存,除了额外的对齐处理)*/

    /* 初始化test数组中每个元素的值为 sizeof(void *),即ngx_hash_elt_t的成员value的所占内存大小 */
    for (i = 0; i < size; i++) 
    {
        test[i] = sizeof(void *);
    }

    /* 遍历所有的实际元素,计算出每个元素在对应槽上所占内存大小,并赋给该元素在test数组上的
     * 相应位置,即散列表中对应的槽 */
    for (n = 0; n < nelts; n++) 
    {
        if (names[n].key.data == NULL) 
        {
            continue;
        }

        /* 找到该元素在散列表中的映射位置 */
        key = names[n].key_hash % size;
        /* 计算存储在该槽上的元素所占的实际内存大小 */
        test[key] = (u_short) (test[key] + NGX_HASH_ELT_SIZE(&names[n]));
    }

    len = 0;

    /* 对test数组中的每个元素(也即每个实际元素在散列表中对应槽所占内存的实际大小)
     * 进行对齐处理 */
    for (i = 0; i < size; i++) 
    {
        if (test[i] == sizeof(void *))
        {
            continue;
        }

        test[i] = (u_short) (ngx_align(test[i], ngx_cacheline_size));

        /* len 统计所有实际元素所占的内存总大小 */
        len += test[i];
    }

    if (hinit->hash == NULL) 
    {
        hinit->hash = ngx_pcalloc(hinit->pool, sizeof(ngx_hash_wildcard_t)
                                             + size * sizeof(ngx_hash_elt_t *));
        if (hinit->hash == NULL) 
        {
            ngx_free(test);
            return NGX_ERROR;
        }

        buckets = (ngx_hash_elt_t **)
                      ((u_char *) hinit->hash + sizeof(ngx_hash_wildcard_t));

    } 
    else 
    {
        /* 为槽分配内存空间,每个槽都是一个指向 ngx_hash_elt_t 结构体的指针 */
        buckets = ngx_pcalloc(hinit->pool, size * sizeof(ngx_hash_elt_t *));
        if (buckets == NULL)
        {
            ngx_free(test);
            return NGX_ERROR;
        }
    }

    /* 分配一块连续的内存空间,用于存储槽的实际数据 */
    elts = ngx_palloc(hinit->pool, len + ngx_cacheline_size);
    if (elts == NULL)
    {
        ngx_free(test);
        return NGX_ERROR;
    }

    /* 进行内存对齐 */
    elts = ngx_align_ptr(elts, ngx_cacheline_size);

    /* 使buckets[i]指向 elts 这块内存的相应位置 */
    for (i = 0; i < size; i++) 
    {
        if (test[i] == sizeof(void *)) 
        {
            continue;
        }

        buckets[i] = (ngx_hash_elt_t *) elts;
        elts += test[i];
    }

    /* 复位teset数组的值 */
    for (i = 0; i < size; i++) 
    {
        test[i] = 0;
    }

    for (n = 0; n < nelts; n++) 
    {
        if (names[n].key.data == NULL) 
        {
            continue;
        }

        /* 计算该实际元素在散列表的映射位置 */
        key = names[n].key_hash % size;
        /* 根据key找到该实际元素应存放在槽中的具体位置的起始地址 */
        elt = (ngx_hash_elt_t *) ((u_char *) buckets[key] + test[key]);

        /* 下面是对存放在该槽中的元素进行赋值 */
        elt->value = names[n].value;
        elt->len   = (u_short) names[n].key.len;

        ngx_strlow(elt->name, names[n].key.data, names[n].key.len);

        /* 更新test[key]的值,以便当有多个实际元素映射到同一个槽中时便于解决冲突问题,
         * 从这可以看出Nginx解决碰撞问题使用的方法是开放寻址法中的用连续非空槽来解决 */
        test[key] = (u_short) (test[key] + NGX_HASH_ELT_SIZE(&names[n]));
    }

    /* 遍历所有的槽,为每个槽的末尾都存放一个为 NULL 的哨兵节点 */
    for (i = 0; i < size; i++)
    {
        if (buckets[i] == NULL) 
        {
            continue;
        }

        elt = (ngx_hash_elt_t *) ((u_char *) buckets[i] + test[i]);

        elt->value = NULL;
    }

    ngx_free(test);

    hinit->hash->buckets = buckets;
    hinit->hash->size    = size;

#if 0

    for (i = 0; i < size; i++) {
        ngx_str_t   val;
        ngx_uint_t  key;

        elt = buckets[i];

        if (elt == NULL) {
            ngx_log_error(NGX_LOG_ALERT, hinit->pool->log, 0,
                          "%ui: NULL", i);
            continue;
        }

        while (elt->value) {
            val.len = elt->len;
            val.data = &elt->name[0];

            key = hinit->key(val.data, val.len);

            ngx_log_error(NGX_LOG_ALERT, hinit->pool->log, 0,
                          "%ui: %p \"%V\" %ui", i, elt, &val, key);

            elt = (ngx_hash_elt_t *) ngx_align_ptr(&elt->name[0] + elt->len,
                                                   sizeof(void *));
        }
    }

#endif

    return NGX_OK;
}
Use of hash data structures

2.4 ngx_hash_find()

/*
 * 参数含义:
 * - hash:是散列表结构体的指针
 * - key:是根据散列方法算出来的散列关键字
 * - name和len:表示实际关键字的地址与长度
 *
 * 执行意义:
 * 返回散列表中关键字与name、len指定关键字完全相同的槽中,ngx_hash_elt_t结构体中value
 * 成员所指向的用户数据.
 */
void *ngx_hash_find(ngx_hash_t *hash, ngx_uint_t key, u_char *name, size_t len)
{
    ngx_uint_t       i;
    ngx_hash_elt_t  *elt;

   
#if 1
    ngx_log_error(NGX_LOG_ALERT, ngx_cycle->log, 0, "hf:\"%*s\"", len, name);
#endif

    /* 对key取模得到对应的hash节点 */
    elt = hash->buckets[key % hash->size];

    if (elt == NULL) 
    {
        return NULL;
    }

    /* 然后在该hash节点所对应的bucket里逐个(该bucket的实现类似数组,结束有
     * 哨兵保证)对比元素名称来找到唯一的那个实际元素,最后返回其value值
     * (比如,如果在addr->hash结构里找到对应的实际元素,返回的value就是
     * 其ngx_http_core_srv_conf_t配置) */
    while (elt->value) 
    {
        if (len != (size_t) elt->len) 
        {
            goto next;
        }

        for (i = 0; i < len; i++) 
        {
            if (name[i] != elt->name[i]) 
            {
                goto next;
            }
        }

        return elt->value;

    next:

        elt = (ngx_hash_elt_t *) ngx_align_ptr(&elt->name[0] + elt->len,
                                               sizeof(void *));
        continue;
    }

    return NULL;
}

2.5 Two hashing methods provided by Nginx

/* 散列方法1:使用BKDR算法将任意长度的字符串映射为整型 */
ngx_uint_t ngx_hash_key(u_char *data, size_t len)
{
    ngx_uint_t  i, key;

    key = 0;

    for (i = 0; i < len; i++)
    {
        key = ngx_hash(key, data[i]);
    }

    return key;
}


/* 散列方法2:将字符串全小写后,再使用BKDR算法将任意长度的字符串映射为整型 */
ngx_uint_t ngx_hash_key_lc(u_char *data, size_t len)
{
    ngx_uint_t  i, key;

    key = 0;

    for (i = 0; i < len; i++)
    {
        key = ngx_hash(key, ngx_tolower(data[i]));
    }

    return key;
}

2.6 Examples of use of basic hash tables

Nginx uses the Hash data structure for the management of virtual hosts. For example, suppose the configuration file nginx.conf has the following configuration:

server {
    listen       192.168.1.1:80;
    server_name  www.web_test2.com blog.web_test2.com;
...
server {
    listen       192.168.1.1:80;
    server_name  www.web_test1.com bbs.web_test1.com;
...

When Nginx is started using this configuration file, if a client requests to port 80 of 192.168.1.1, Nginx needs to do
a lookup to see which server configuration should be used for the current request. In order to improve the search efficiency, at startup, Nginx will
create a Hash data structure based on these server_names.

In the ngx_http_server_names method of ngx_http.c:

.
    hash.key = ngx_hash_key_lc;
    hash.max_size = cmcf->server_names_hash_max_size;
    hash.bucket_size = cmcf->server_names_hash_bucket_size;
    hash.name = "server_names_hash";
    hash.pool = cf->pool;

    if (ha.keys.nelts) 
    {
        hash.hash = &addr->hash;
        hash.temp_pool = NULL;

        if (ngx_hash_init(&hash, ha.keys.elts, ha.keys.nelts) != NGX_OK) 
        {
            goto failed;
        }
    }
    ...
The initial state of the Hash data structure before calling ngx_hash_init

Hash data structure state after calling ngx_hash_init


In the figure, the field buckets points to the storage space corresponding to the Hash node. Since buckets is a secondary pointer, *buckets itself is an array, and each
array element is used to store the Hash node mapped to this. Since there may be multiple actual elements mapped to the same Hash node (that is, conflicts occur), the actual
elements are again organized in the form of an array and stored in a bucket. The end of this array is marked with the sentinel element NULL, and the previous Each ngx_hash_elt_t
structure corresponds to the storage of an actual element.

3. Implementation of Nginx wildcard hash table

3.1 Principle

A hash table that supports wildcards is to add the keywords of the elements in the basic hash table with the characters after the wildcards are removed as keywords.
For example, for the case where the keyword is "www.test. " with wildcards, a special post-wildcard hash table is directly established, and
the keyword of the storage element is "www.test". In this way, if you want to search whether "www.test.cn" matches "www.test.
", you can use
the special method ngx_hash_find_wc_tail provided by Nginx to search. The ngx_hash_find_wc_tail method will
convert www.test.cn to www.test string and then Start query.

Similarly, for the case where the keyword is " .test.com" with a preceding wildcard, a dedicated hash
table with preceding wildcards is also directly established, and the keyword of the storage element is "com.test.". If we want to retrieve whether smtp.test.com matches
.test.com, we
can use the special method ngx_hash_find_wc_head provided by Nginx to retrieve. The ngx_hash_find_wc_head method will convert
the smtp.test.com to be queried into a string of com.test. and start the query.

3.2 Corresponding structure

3.2.1 ngx_hash_wildcard_t structure

typedef struct {
    /* 基本散列表 */
    ngx_hash_t        hash;
    /* 当使用这个ngx_hash_wildcard_t通配符散列表作为某个容器的元素时,可以使用这个value  
     * 指针指向用户数据 */
    void             *value;
}ngx_hash_wildcard_t;

3.2.2 ngx_hash_combined_t structure

typedef struct {
    /* 用于精确匹配的基本散列表 */
    ngx_hash_t            hash;
    /* 用于查询前置通配符的散列表 */
    ngx_hash_wildcard_t  *wc_head;
    /* 用于查询后置通配符的散列表 */
    ngx_hash_wildcard_t  *wc_tail;
}ngx_hash_combined_t;

Note: The keywords of the elements in the pre-wildcard hash table, after removing the * wildcard, will be separated by the "." symbol, and the
elements will be stored as keywords in reverse order. Correspondingly, the same processing is done when querying elements.

3.2.3 ngx_hash_keys_arrays_t structure

typedef struct {
    /* 下面的keys_hash、dns_wc_head_hash、dns_wc_tail_hash都是简易散列表,而hsize指明了  
     * 散列表中槽的个数,其简易散列方法也需要对hsize求余 */
    ngx_uint_t        hsize;

    /* 内存池,用于分配永久性内存 */
    ngx_pool_t       *pool;
    /* 临时内存池,下面的动态数组需要的内存都由temp_pool内存池分配 */
    ngx_pool_t       *temp_pool;

    /* 用动态数组以ngx_hash_key_t结构体保存着不含有通配符关键字的元素 */
    ngx_array_t       keys;
    /* 一个极其简易的散列表,它以数组的形式保存着hsize个元素,每个元素都是ngx_array_t  
     * 动态数组。在用户添加的元素过程中,会根据关键码将用户的ngx_str_t类型的关键字添加
     * 到ngx_array_t动态数组中。这里所有的用户元素的关键字都不可以带通配符,表示精确
     * 匹配 */
    ngx_array_t      *keys_hash;

    /* 用动态数组以ngx_hash_key_t结构体保存着含有前置通配符关键字的元素生成的中间关键字 */
    ngx_array_t       dns_wc_head;
    /* 一个极其简易的散列表,它以数组的形式保存着hsize个元素,每个元素都是ngx_array_t  
     * 动态数组。在用户添加的元素过程中,会根据关键码将用户的ngx_str_t类型的关键字添加
     * 到ngx_array_t动态数组中。这里所有的用户元素的关键字都带前置通配符 */
    ngx_array_t      *dns_wc_head_hash;

    /* 用动态数组以ngx_hash_key_t结构体保存着含有后置通配符关键字的元素生成的中间关键字 */
    ngx_array_t       dns_wc_tail;
    /* 一个极其简易的散列表,它以数组的形式保存着hsize个元素,每个元素都是ngx_array_t  
     * 动态数组。在用户添加的元素过程中,会根据关键码将用户的ngx_str_t类型的关键字添加
     * 到ngx_array_t动态数组中。这里所有的用户元素的关键字都带后置通配符 */
    ngx_array_t      *dns_wc_tail_hash;
} ngx_hash_keys_arrays_t;

3.3 Wildcard Hash Table Related Functions

3.3.1 ngx_hash_wildcard_init(): Initialize wildcard hash table

/*
 * 参数含义:
 * - hinit:是散列表初始化结构体的指针
 * - names:是数组的首地址,这个数组中每个元素以ngx_hash_key_t作为结构体,
 *          它存储着预添加到散列表中的元素(这些元素的关键字要么是含有前
 *          置通配符,要么含有后置通配符)
 * - nelts:是names数组的元素数目
 *
 * 执行意义:
 * 初始化通配符散列表(前置或者后置)。
 */
ngx_int_t ngx_hash_wildcard_init(ngx_hash_init_t *hinit, ngx_hash_key_t *names,
    ngx_uint_t nelts)
{
    size_t                len, dot_len;
    ngx_uint_t            i, n, dot;
    ngx_array_t           curr_names, next_names;
    ngx_hash_key_t       *name, *next_name;
    ngx_hash_init_t       h;
    ngx_hash_wildcard_t  *wdc;

    /* 从临时内存池temp_pool中分配一个元素个数为nelts,大小为sizeof(ngx_hash_key_t)
     * 的数组curr_name */
    if (ngx_array_init(&curr_names, hinit->temp_pool, nelts,
                       sizeof(ngx_hash_key_t))
        != NGX_OK)
    {
        return NGX_ERROR;
    }

    /* 从临时内存池temp_pool中分配一个元素个数为nelts,大小为sizeof(ngx_hash_key_t)
     * 的数组next_name */
    if (ngx_array_init(&next_names, hinit->temp_pool, nelts,
                       sizeof(ngx_hash_key_t))
        != NGX_OK)
    {
        return NGX_ERROR;
    }

    /* 遍历names数组中保存的所有通配符字符串 */
    for (n = 0; n < nelts; n = i) 
    {

#if 0
        ngx_log_error(NGX_LOG_ALERT, hinit->pool->log, 0,
                      "wc0: \"%V\"", &names[n].key);
#endif

        dot = 0;

        /* 遍历该通配符字符串的每个字符,直到找到 '.' 为止 */
        for (len = 0; len < names[n].key.len; len++) 
        {
            if (names[n].key.data[len] == '.') 
            {
                /* 找到则置位该标识位 */
                dot = 1;
                break;
            }
        }

        /* 从curr_names数组中取出一个类型为ngx_hash_key_t的指针 */
        name = ngx_array_push(&curr_names);
        if (name == NULL) 
        {
            return NGX_ERROR;
        }

        /* 若dot为1,则len为'.'距该通配符字符串起始位置的偏移值,
         * 否则为该通配符字符串的长度 */
        name->key.len  = len;
        /* 将通配符字符串赋值给name->key.data */
        name->key.data = names[n].key.data;
        /* 以该通配符字符串作为关键字通过key散列方法算出该通配符字符串在散列表中的
         * 映射位置 */
        name->key_hash = hinit->key(name->key.data, name->key.len);
        /* 指向用户有意义的数据结构 */
        name->value    = names[n].value;

#if 0
        ngx_log_error(NGX_LOG_ALERT, hinit->pool->log, 0,
                      "wc1: \"%V\" %ui", &name->key, dot);
#endif

        dot_len = len + 1;

        /* 若前面的遍历中已找到'.',则len加1 */
        if (dot) 
        {
            len++;
        }

        next_names.nelts = 0;

        /* 当通配符字串的长度与len不等时,即表明dot为1 */
        if (names[n].key.len != len) 
        {
            /* 从next_names数组中取出一个类型为ngx_hash_key_t的指针 */
            next_name = ngx_array_push(&next_names);
            if (next_name == NULL)
            {
                return NGX_ERROR;
            }
            
            /* 将该通配符第一个'.'字符之后的字符串放在next_name中 */
            next_name->key.len  = names[n].key.len - len;
            next_name->key.data = names[n].key.data + len;
            next_name->key_hash = 0;
            next_name->value    = names[n].value;

#if 0
            ngx_log_error(NGX_LOG_ALERT, hinit->pool->log, 0,
                          "wc2: \"%V\"", &next_name->key);
#endif
        }

        /* 这里n为names数组中余下尚未处理的通配符字符串中的第一个在names数组中的下标值,
         * 该for循环是用于提高效率,其实现就是比较当前通配符字符串与names数组中的下一个
         * 通配符字符,若发现'.'字符之前的字符串都完全相同,则直接将该通配符字符串'.'
         * 之后的字符串添加到next_names数组中 */
        for (i = n + 1; i < nelts; i++) 
        {
            /* 对该通配符字符串与names数组中的下一个通配符字符串进行比较,若不等,则
             * 直接跳出该for循环,否则继续往下处理 */
            if (ngx_strncmp(names[n].key.data, names[i].key.data, len) != 0) 
            {
                break;
            }

            /* 对在该通配符字符串中没有找到'.'的通配符字符串下面不进行处理' */
            if (!dot
                && names[i].key.len > len
                && names[i].key.data[len] != '.')
            {
                break;
            }

            /* 从next_names数组中取出一个类型为ngx_hash_key_t的指针 */
            next_name = ngx_array_push(&next_names);
            if (next_name == NULL) 
            {
                return NGX_ERROR;
            }

            next_name->key.len  = names[i].key.len  - dot_len;
            next_name->key.data = names[i].key.data + dot_len;
            next_name->key_hash = 0;
            next_name->value    = names[i].value;

#if 0
            ngx_log_error(NGX_LOG_ALERT, hinit->pool->log, 0,
                          "wc3: \"%V\"", &next_name->key);
#endif
        }

        /* 若next_names数组中有元素 */
        if (next_names.nelts)
        {

            h = *hinit;
            h.hash = NULL;

            if (ngx_hash_wildcard_init(&h, (ngx_hash_key_t *) next_names.elts,
                                       next_names.nelts)
                != NGX_OK)
            {
                return NGX_ERROR;
            }

            wdc = (ngx_hash_wildcard_t *) h.hash;

            if (names[n].key.len == len) 
            {
                wdc->value = names[n].value;
            }

            name->value = (void *) ((uintptr_t) wdc | (dot ? 3 : 2));

        } 
        else if (dot) 
        {
            name->value = (void *) ((uintptr_t) name->value | 1);
        }
    }

    if (ngx_hash_init(hinit, (ngx_hash_key_t *) curr_names.elts,
                      curr_names.nelts)
        != NGX_OK)
    {
        return NGX_ERROR;
    }

    return NGX_OK;
}
Structure diagram of ngx_hash_combined_t wildcard hash table

3.4 Examples of the use of hash tables with wildcards

The data structure pointed to by the value pointer in the hash table element ngx_hash_elt_t is the TestWildcardHashNode structure defined below. The code is as follows:

typedef struct {
    /* 用于散列表中的关键字 */
    ngx_str_t servername;
    /* 这个成员仅是为了方便区别而已 */
    ngx_int_t se;
}TestWildcardHashNode;

The key for each hash table element is the servername string. First define the ngx_hash_init_t and ngx_hash_keys_arrays_t variables
to prepare for the initialization of the hash table. The code is as follows:

/* 定义用于初始化散列表的结构体 */
ngx_hash_init_t hash;
/* ngx_hash_keys_arrays_t用于预先向散列表中添加元素,这里的元素支持带通配符 */
ngx_hash_keys_arrays_t ha;
/* 支持通配符的散列表 */
ngx_hash_combined_t combinedHash;

ngx_memzero(&ha, sizeof(ngx_hash_keys_arrays_t));

combinedHash is a variable we define to point to the hash table, which includes pointers to 3 hash tables, and assign values ​​to these 3 hash table pointers in turn.

/* 临时内存池只是用于初始化通配符散列表,在初始化完成后就可以销毁掉 */
ha.temp_pool = ngx_create_pool(16384, cf->log);
if (ha.temp_pool == NULL)
{
    return NGX_ERROR;
}

/* 假设该例子是在ngx_http_xxx_postconf函数中的,所以就用了ngx_conf_t类型的cf下的内存池
 * 作为散列表的内存池 */
ha.pool = cf->pool;

/* 调用ngx_hash_keys_array_init方法来初始化ha,为下一步向ha中加入散列表元素做好准备 */
if (ngx_hash_keys_array_init(&ha, NGX_HASH_LARGE) != NGX_OK)
{
    return NGX_ERROR;
}

The following code creates three structures of type TestWildcardHashNode, testHashNode[3], which respectively represent the hash table elements that can be matched with the preceding wildcards, the hash
table elements that can be matched with the trailing wildcards, and the hash table elements that need to be completely matched.

TestWildcardHahsNode testHashNode[3];
testHashNode[0].servername.len = ngx_strlen("*.text.com");
testHashNode[0].servername.data = ngx_pcalloc(cf->pool, ngx_strlen("*.test.com"));
ngx_memcpy(testHashNode[0].servername.data, "*.test.com", ngx_strlen("*.test.com"));

testHashNode[1].servername.len = ngx_strlen("www.test.*");
testHashNode[1].servername.data = ngx_pcalloc(cf->pool, ngx_strlen("www.test.*"));
ngx_memcpy(testHashNode[1].servername.data, "www.test.*", ngx_strlen("www.test.*"));

testHashNode[2].servername.len = ngx_strlen("www.text.com");
testHashNode[2].servername.data = ngx_pcalloc(cf->pool, ngx_strlen("www.test.com"));
ngx_memcpy(testHashNode[2].servername.data, "www.test.com", ngx_strlen("www.test.com"));

for (i = 0; i < 3; i++)
{
    testHashNode[i].seq = i;
    /* 这里flag必须设置为NGX_HASH_WILDCARD_KEY,才会处理带通配符的关键字 */
    ngx_hash_add_key(&ha, &testHashNode[i].servername, 
                                    &testHashNode[i], NGX_HASH_WILDCARD_KEY);
}

Before calling the initialization function of ngx_hash_init_t, first set the members in ngx_hash_init_t, such as the size of the slot, the hash method, etc.:

hash.key         = ngx_hash_key_lc;
hash.max_size    = 100;
hash.bucket_size = 48;
hash.name        = "test_server_name_hash";
hash.pool        = cf->pool;

The keys dynamic array of ha stores the keywords that need to be completely matched. If the keys array is not empty, then start to initialize the first hash table:

if (ha.keys.nelts)
{
    /* 需要显式地把ngx_hash_init_t中的hash指针指向combinedHash中的完全匹配散列表 */
    hash.hash = &combinedHash.hash;
    /* 初始化完全匹配散列表时不会使用到临时内存池 */
    hash.temp_pool = NULL;
    
    /* 将keys动态数组直接传给ngx_hash_init方法即可,ngx_hash_init_t中的
     * hash指针就是初始化成功的散列表 */
    if (ngx_hash_init(&hash, ha.keys.nelts, ha.keys.nelts) != NGX_OK)
    {
        return NGX_ERROR;
    }
}

Let's continue to initialize the prepended wildcard hash table:

if (ha.dns_wc_head.nelts)
{
    hash.hash = NULL;
    /* ngx_hash_wildcard_init方法需要用到临时内存池 */
    hash.temp_pool = ha.temp_pool;
    if (ngx_hash_wildcard_init(&hash, ha.dns_wc_head.elts, ha.dns_wc_head.nelts) != NGX_OK)
    {
        return NGX_ERROR;
    }
    
    /* ngx_hash_init_t中的hash指针是ngx_hash_wildcard_init初始化成功的散列表,
     * 需要将它赋到combinedHash.wc_head前置通配符散列表指针中 */
    combinedHash.wc_head = (ngx_hash_wildcard_t *)hash.hash;
}

Then continue to initialize the post-wildcard hash table:

if (ha.dns_wc_tail.nelts)
{
    hash.hash = NULL;
    hash.temp_pool = hs.temp_pool;
    if (ngx_hash_wildcard_init(&hash, ha.dns_wc_tail.elts, ha.dns_wc_tail.nelts) != NGX_OK)
    {
        return NGX_ERROR;
    }
    
    /* ngx_hash_init_t中的hash指针是ngx_hash_wildcard_init初始化成功的散列表,需要将它赋到
     * combinedHash.wc_tail后置通配符散列表指针中 */
    combinedHash.wc_tail = (ngx_hash_wildcard_t *) hash.hash;
}

At this point, the temporary memory pool has no meaning, that is, these arrays and simple hash tables in ngx_hash_keys_arrays_t can be destroyed. Here it is only necessary to
simply destroy the temp_pool memory pool:

ngx_destroy_pool(ha.temp_pool);

Let's check if the hash table is working properly. First, query for the keyword www.test.org, which, in fact, should match the element
www.text.* in the trailing wildcard hash table:

/* 首先定义待查询的关键字符串findServer */
ngx_str_t findServer;
findServer.len = ngx_strlen("www.test.org");
/* 为什么必须要在内存池中分配空间以保存关键字呢?因为我们使用的散列方法是 ngx_hash_key_l,它会试着把
 * 关键字全小写 */
findServer.data = ngx_pcalloc(cf->pool, ngx_strlen("www.test.org"));
ngx_memcpy(findServer.data, "www.test.org", ngx_strlen("www.test.org"));

/* ngx_hash_find_combined方法会查找出www.test.*对应的散列表元素,返回其指向的用户数据
 * ngx_hash_find_combined, 也就是testHashNode[1] */
TestWildcardHashNode *findHashNode = ngx_hash_find_combined(&combinedHash, 
        ngx_hash_key_lc(findServer.data, findServer.len), findServer.data, findServer.len);

If no query is found, the value of findHashNode is NULL.

Then query www.test.com, in fact, the three nodes testHashNode[0], testHashNode[1], testHashNode[2] are all matched, because
.test.com, www.test. , www.test.com all match. However, according to the rule of full match first, the ngx_hash_find_combined method will return
the address of testHashNode[2], which is the element corresponding to www.test.com.

findServer.len = ngx_strlen("www.test.com");
findServer.data = ngx_pcalloc(cf->pool, ngx_strlen("www.test.com"));
ngx_memcpy(findServer.data, "www.test.com", ngx_strlen("www.test.com");

findHashNode = ngx_hash_find_combined(&combinedHash, 
                    ngx_hash_key_lc(findServer.data, findServer.len),
                    findServer.data, findServer.len);

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324814581&siteId=291194637