C language tear a Hash table (HashTable)

What is Hash Table

The hash table uses the feature that the array supports random access to data according to the subscript, so the hash table is actually an extension of the array, which is evolved from the array. It can be said that if there is no array, there is no hash table.

hash function

A hash function is a function that hashes the node we want to insert into a value. It is a function. We can define it as hash(key), and it is almost impossible to find a hash function with different hash values ​​corresponding to different keys. Even well-known hash algorithms such as MD5, SHA, and CRC in the industry cannot completely avoid such hash collisions. Moreover, because the storage space of the array is limited, the probability of hash collision will also be increased.

Therefore, it is almost impossible for us to find a perfect conflict-free hash function. Even if we can find it, the cost of time and calculation is very high. Therefore, we need to solve the problem of hash conflicts through other means.

hash collision

open addressing

The core idea of ​​the open addressing method is that if there is a hash collision, we re-probe a free location and insert it. So how to re-probe the new location? Let me first talk about a relatively simple detection method, linear detection (Linear Probing). When we insert data into the hash table, if a certain data is hashed by the hash function and the storage location is already occupied, we start from the current location and search backwards one by one to see if there is a free location until we find it. .

linked list method

The linked list method is a more commonly used hash conflict resolution method, and it is much simpler than the open addressing method. Let's look at this picture. In the hash table, each "bucket" or "slot" corresponds to a linked list. We put all elements with the same hash value into the linked list corresponding to the same slot.

The bottom layer of HashMap uses the linked list method to resolve conflicts. Even if the load factor and hash function are designed reasonably, there will inevitably be a situation where the zipper is too long. Once the zipper is too long, it will seriously affect the performance of HashMap.

Therefore, in the JDK1.8 version, in order to further optimize the HashMap, we introduced a red-black tree. And when the length of the linked list is too long (default exceeds 8), the linked list is converted into a red-black tree. We can improve the performance of HashMap by using the characteristics of fast addition, deletion, modification and query of red-black tree. When the number of red-black tree nodes is less than 8, the red-black tree will be converted into a linked list. Because in the case of a small amount of data, the red-black tree needs to maintain balance, and the performance advantage is not obvious compared with the linked list.

loading factor

The larger the load factor, the more elements there are in the hash table, the fewer free positions, and the greater the probability of hash collision. Not only does the process of inserting data require multiple addressing or a long chain, but the search process also becomes very slow.

The default maximum loading factor is 0.75. When the number of elements in the HashMap exceeds 0.75*capacity (capacity represents the capacity of the hash table), capacity expansion will start, and each expansion will double the original size.

the code

Here we give the HashTable code in C language. We use the linked list method, and we don't convert it into a red-black tree when the linked list is too long, just a simple linked list.

#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>

typedef struct Node {
    
    
    int key;
    int val;
    struct Node *next;
} Node;

typedef struct HashMap {
    
    
    int size;
    int count;
    struct Node **hashmap;
} HashMap;

// 定义哈希函数
unsigned int hash(int n) {
    
    
    return (unsigned int) n;
}

// 创建一个节点
Node *creatNode(int key, int val) {
    
    
    Node *node = (Node *) malloc(sizeof(Node));
    node->key = key;
    node->val = val;
    node->next = NULL;
    return node;
}

// 创建一个hash表
HashMap *createHashMap() {
    
    
    HashMap *H = (HashMap *) malloc(sizeof(HashMap));
    H->size = 8;
    H->count = 0;
    H->hashmap = (Node **) calloc(H->size, sizeof(Node *));
    return H;
}

// 扩容,以2倍的形式扩容
static void extend(HashMap *map) {
    
    
    int newsize = map->size * 2;
    Node **newhashmap = (Node **) calloc(newsize, sizeof(Node *));
    for (int i = 0; i < map->size; i++) {
    
    
        Node *node = map->hashmap[i];
        Node *head1 = (Node *) malloc(sizeof(Node));
        Node *head2 = (Node *) malloc(sizeof(Node));
        Node *temp1 = head1;
        Node *temp2 = head2;
        while (node) {
    
    
            if (hash(node->key) % newsize != i) {
    
    
                temp2->next = node;
                temp2 = temp2->next;
            } else {
    
    
                temp1->next = node;
                temp1 = temp1->next;
            }
            node = node->next;
        }
        temp1->next = NULL;
        temp2->next = NULL;
        newhashmap[i] = head1->next;
        newhashmap[i + map->size] = head2->next;
        free(head1);
        free(head2);
    }
    map->size = newsize;
    free(map->hashmap);
    map->hashmap = newhashmap;
}

// 插入某个结点
bool insert(HashMap *map, int key, int val) {
    
    
    int hash_key = hash(key) % map->size;
    // 要插入的哈希值未产生碰撞
    if (map->hashmap[hash_key] == NULL) {
    
    
        map->count++;
        map->hashmap[hash_key] = creatNode(key, val);
    } else {
    
    
        Node *head = map->hashmap[hash_key];
        if (head->key == key) return false;
        while (head->next && head->next->key != key) head = head->next;
        if (head->next == NULL) {
    
    
            map->count++;
            head->next = creatNode(key, val);
        } else if (head->next->key == key) return false;
    }
	// 装载因子过高就开始扩容
    if (map->count / map->size > 0.75) extend(map);
    return true;
}

// 寻找某个结点
bool find(HashMap *map, int key, int *v) {
    
    
    int hash_key = hash(key) % map->size;
    if (map->hashmap[hash_key] == NULL) {
    
    
        return false;
    } else {
    
    
        Node *head = map->hashmap[hash_key];
        if (head->key == key) {
    
    
            *v = head->val;
            return true;
        }
        while (head->next && head->next->key != key) head = head->next;
        if (head->next == NULL) return false;
        if (head->next->key == key) {
    
    
            *v = head->next->val;
            return true;
        }
    }
    return false;
}

// 删除某个结点
void delete(HashMap *map, int key) {
    
    
    int hash_key = hash(key) % map->size;
    if (map->hashmap[hash_key] == NULL) return;
    Node *head = map->hashmap[hash_key];
    if (head->key == key) {
    
    
        map->hashmap[hash_key] = head->next;
        map->count--;
        free(head);
        return;
    }
    while (head->next && head->next->key != key) head = head->next;
    if (head->next == NULL) return;
    if (head->next->key == key) {
    
    
        Node *temp = head->next;
        head->next = head->next->next;
        map->count--;
        free(temp);
    }
    return;
}


int main() {
    
    
    HashMap *H = createHashMap();
    for (int i = 0; i < 1024; i++) {
    
    
        insert(H, i, i);
    }
    printf("H size is %d\n",H->size);
    printf("H count is %d\n",H->count);
    delete(H, 0);
    int v = 0;
    if (find(H, 0, &v)) printf("%d\n", v);
    else printf("not find \n");
    if (find(H, 4, &v)) printf("%d\n", v);
    else printf("not find \n");
    return 0;
}

Guess you like

Origin blog.csdn.net/weixin_43903639/article/details/129720549