[Data structure] Hash table - C language

1. Introduction

The transliteration of hash table is called hash table, also called hash table. It is an applicationCharacteristics of array subscript indexing, an extended data structure.

 We all know that the time complexity of obtaining the target value through array subscript indexing is O(1) , and the hash table gives the subscript some special meaning.
比如: Open another array , change the subscript to the target value, change the target value to the subscript, and store it .
 If you want to know the subscript of the target value, just use the target value to index, and the time complexity is O(1) . 但是Since space is opened up to store information, the space complexity is O(N) .
 So the index value is called Key , and the value indexed by Key is called val.

Here is a list of hash uses that bloggers have encountered so far~

  1. Eliminate the increase in time complexity caused by character order - anagrams (may need to be used in conjunction with double pointers)
  2. Through the specified value, find its position - subscript (array) / address (data structure such as singly linked list) (insertion, deletion and random access are all O(1) containers)
  3. Record the number of occurrences of the specified value - find the length of a continuous and different XX (can be a character or an integer)
  4. Treat a data type as a hash table - the maximum product of word lengths

Of course, if you are interested in these questions, you can check out my column: Jianzhi Offer Special Breakthrough Edition

In order to facilitate the search, let me say a few more words here, the 4th chapter - in the integer chapter , the 1st chapter - the string chapter, the 2nd chapter - the hash table chapter.

2. Storage method

 Why talk about storage methods? Mainly to solve the problem of conflicts in the hash table . When will conflicts occur? Mainly related to the implementation of hash functions.

比如: We want to store [10, 1] where 10 is the target value in the original array and 1 is the subscript. When swapping positions in the hash table, 10 is the index value and 1 is the target value. So how to store the index? Generally we will map it to a position , for example, 10%10 here is 0 , and store it at the 0 subscript of the hash table.

接着存What about a value like [0, 1]? 0%10 is still 0. Is it still stored at the position with index 0 in the hash table?

The original position already has a value. When something like this happens 存储位置冲突, we generally call it a hash conflict. Of course, our example here is relatively simple. So how to solve it? Let’s get back to the point.

Below we use this example to explain in detail

typedef struct Hash
{
    
    
	int key;
	int val;
}HNode;

1. Develop addressing method

Assumption: 开始哈希表存储数据个数为0. Using the conflict example above, I would like to add that we have stored one more key~. When a conflict occurs, we will search until there is a location to store the value. For example, if you are here 很想上厕所, you will start from the position near the door and look for pit positions one by one. If you are lucky, there will be no one at the first position ( no conflict ). If you are unlucky, you may not find it. Pit location ( not enough space ).

That 类比一下: the hash table is like a large toilet (there must be free pits) . You 指定位置go to the toilet . If there is no one, you shiftput (key and val) in. If there is someone, go down until you find a pit where no one is. , and then shiftput it in (key and val).

Some students want to ask, why do we need to save the key here? Can we not save it?

For example: when using the hash table to search for [10,1] above, we will still calculate the result of 0%10 first - , 0using this result to search in the hash table is inaccurate, 可能会查找到与之冲突的值we also need Confirm whether the number inside keyis 10, so that you can 保证查对了.

2. Zipper method

  We also use the above example here. The subscript of the hash table is actually a 链表structure. When a conflict occurs, we open another node at the subscript to store the conflicting value, so that the conflict value The subscripts are the same, we just need to search in this linked list.

There are also students who have to ask - won't the linked list be too long? So the time complexity is not O(1), right? The blogger believes that the implementation of the hash function is actually related to the occurrence of conflicts 只要哈希函数取得好,就不怕冲突. Therefore, even if we consider a conflict 链表上的结点是常数个, the time complexity is still O(1).

3. Hash function - macro function

Here we introduce the hash table implemented by the boss, we can 力扣上直接用! However, we need to write the function ourselves to meet our actual needs.

Here we mainly introduce several macro functions and a processing handle, which are usually enough~

1. Process handle

UT_hash_handle hh;

When defining the structure , we can just add this code. At the same time, there is no need to assign a value to this handler.

Then why is it called a processing handle? We need to use this variable to put a certain node in the hash table 删除,添加,查找,改值, and then leave the data processing work to it, so we can understand why it is called a processing handle.

for example:

typedef struct Hash
{
    
    
	int key;
	int val;
	UT_hash_handle hh;
}HNode;

We will also use this structure below~

2. Find nodes

int type

HNode* hash = NULL;
HNode* ret = NULL;
int key = 10;
HASH_FIND_INT(hash,&key,ret);
//转换小写一看便知,hash_find_int,名字上我们大概就是查找一个int的变量。

The three parameters here:
①: Find the hash table through HNode* (pointer).
②: Use key to find whether there is this value in the hash table. As for why it is a pointer, the blogger thinks that macro functions are essentially the replacement of some statements. If you want to define variables, it is not practical because pointers are higher 可操作性. From a function perspective alone, pointers are 空间利用效率also higher.
③: Receive the value and put the final result in ret. The macro function has no return value! This parameter here acts as 返回值的作用.

What is the return result?
There are two situations: found/not found

  1. Found, ret points to the found node. Just here多一个改的操作
  2. Not found, ret points to empty.

So we can ret是否为空judge whether it is found or not.

string

Here we are going to change the structure type~

typedef struct Hash
{
    
    
	char key[10];
	int val;
	UT_hash_handle hh;
}HNode;
HNode* hash = NULL;
HNode* ret = NULL;
char key[10] = "shun_hua"; 

HASH_FIND_STR(hash,key,ret);
//同样,大小写一转换,str不就是字符串的意思么

The purpose of the parameters is the same as above, but what needs to be noted here is that, key本身就是指针there is no need to pass the key address!

3. Add nodes

int type

//这里我们用的是上面key类型为int的结构体,别搞混了哦~
HNode* hash = NULL;
HNode* ret = (HNode*)malloc(sizeof(HNode));
int key = 10;
int val = 1;
ret->key = key;
ret->val = val; 
HASH_ADD_INT(hash,key,ret);

As for why the key here is not a pointer, we can simply understand that when calculating the hash value, we are 直接用值计算的, so the key is passed.

The third parameter: Here is the node pointed to by ret, placed in the hash table.

str type

Note: In C language, 没有字符串类型the string type mentioned here is just 方便理解for char类型的数组.

HNode* hash = NULL;
HNode* ret = (HNode*)malloc(sizeof(HNode));
char key[10] = "shun_hua"
//这里如果说,为了提升效率,建议用指针类型,比如char *key = "shun_hua"
//当然如果这里指针是可以当做字符串进行查找的,但是添加是作为指针进行添加的。
int val = 1;
//字符串我们就要用到这个函数进行字符串的拷贝
strcpy(ret->key,key);
//这里会将'\0'也拷贝过去!
//当然,memcpy也可以
ret->val = val; 
HASH_ADD_STR(hash,key,ret);

4. Delete a node

Here we need to use the previous node search function. After all, if you want to delete a node, you must have a node to delete for you~

HNode* hash = NULL;
HNode* ret = NULL;
int key = 10;
HASH_FIND_INT(hash,&key,ret);
if(ret!=NULL)
{
    
    
	HASH_DEL(hash,ret);
	free(ret);
}

Of course, we have added a free operation here to explain 只是将节点从哈希表中移除,并没有将节点的空间进行释放that we need the free function to release its space.

Note: The deletion of strings is similar to the above, so we will not list them here.

5. Delete all nodes

HNode* hash = NULL;
HNode* cur, *next;
HASH_ITER(hh,hash,cur,next)//这里的hh就是处理句柄
{
    
    
	HASH_DEL(hash,cur);
	free(cur);
}

Here we can think of it as a deletion operation of a linked list. As for the macro here, it is a bit strange because the implementation principle is that the for循环meaning of this code is to obtain an element from the hash table, while retaining the next element, and remove this element from the hash table. Remove it from the table and release its space at the same time until there is no data in the hash table!

In addition, we introduce two more general macros.

6.HASH_ADD

①Plastic surgery

HNode* hash = NULL;
HNode* ret = (HNode*)malloc(sizeof(HNode));
int key = 10;
int val = 1;
ret->key = key;
ret->val = val; 
HASH_ADD(hh,hash,key,sizeof(key),ret);

The fourth parameter usessizeof

②String

HNode* hash = NULL;
HNode* ret = (HNode*)malloc(sizeof(HNode));
char key[10] = "key";
int val = 1;
strcpy(ret->key,key);
ret->val = val; 
HASH_ADD(hh,hash,key,strlen(key),ret);

The fourth parameter usesstrlen

7.HASH_FIND

①Plastic surgery

HNode* hash = NULL;
HNode* ret = NULL;
int key = 10;
HASH_FIND_INT(hh,hash,&key,sizeof(key),ret);

The fourth parameter usessizeof

②String

HNode* hash = NULL;
HNode* ret = NULL;
char key[10] = "shun_hua"; 
HASH_FIND_STR(hh,hash,key,strlen(key),ret);
//这里key就是字符串的地址。

The fourth parameter usesstrlen


Supplement: If you feel it is not detailed, you can read this article, a summary of the usage of the open source hash project uthash , 指针如何添加and the example is here 此文末.

Summarize

We have finished sharing our article here. They all say 一万个人眼里有一万个哈姆雷特, I wonder how much you have gained from this article?In short, I hope this article will be helpful to C friends!us下篇文章再见咯!

Guess you like

Origin blog.csdn.net/Shun_Hua/article/details/131202537