Learning "Introduction to Algorithms" (14)----hash table (hash table) (C language)


foreword

This article mainly explains the relevant content of the hash table (also called a hash table), and focuses on the hash function and the implementation of the chained hash table in C language.


1. Hash table

1. What is a direct addressing table?

For a direct addressing table, we can simply treat it as an ordinary array .
We can directly access the elements of the corresponding storage space through the subscript of the array., such a process of searching for corresponding elements is direct addressing
and a similar storage space is direct addressing table

2. What is a hash table?

A hash table is a generalization of the concept of an ordinary array
Like a direct addressing table, a hash table is a dynamic collection structure that provides users with the functions of inserting elements, deleting elements, and finding elements
The difference between a hash table and a direct address table is concentrated in:

1.散列表的存储位置是散列函数处理k值后得到的。
2.直接寻址表的位置直接就是k值

To give an image but not very appropriate example:

If two methods are used to set the wife password:

1.直接寻址就像将密码设置为88888888,8个8,甚至可以说没有加密过程
2.散列表就是将密码设置成了类似于姓名首字母+身份证尾号,是仅有个人信息这个“散列函数”,经过处理得到的

3. Why do you need a hash table?

The application scenario of the hash table is much larger than that of the direct address table:
the reasons are:

1.k值的大小范围可能远超计算机的内存,直接将k作为存储位置,显然不可能
2.k值直接作为存储位置,是很不安全的。在安全性要求高的领域运用,是很容易被提取保密信息,安全隐患极大

4. The core point of the hash table

=The key to the hash table actually focuses on solving two problems:

1.一个优良的散列函数
2.如果两个元素必须要放入同一个内存空间,该如何解决冲突
3.k值可能不是在自然数集中,不能直接当作存储空间的位置

Second, the basic idea of ​​the hash table

1. Hash function

(1) Characteristics of an excellent hash function

A good hash function must satisfy the following characteristics:

1.首先要保证每一个关键字都被等可能地散列在m个存储空间中的任何一个
2.非自然数要被转化到自然数集中(0,1,2,......)
3.要将k值的集合转化到0~m-1的范围内,m是存储空间的个数

(2) Three feasible hash functions in the chaining method

division hashing

公式:
h ( k ) = k mod mh(k)=k\ mod \ mh(k)=k m o d m  
This method can map the k value to one of the m memory spaces.
To use division hashing, m is required to be a prime number that is not too close to an integer power of 2, the proof process is not repeated here, the starting hash method for this value often has excellent performance

multiplicative hashing

公式:
h ( k ) = ⌊ m ( k A   m o d   1 ) ⌋ h(k)=\lfloor m(kA\ mod \ 1)\rfloor h(k)=m ( k A m o d 1 )⌋  
The advantage of the multiplicative hash method is that the choice of m is not particularly critical. Generally, m = 2 x is selected, x is an integer m=2^x, and x is an integerm=2x ,xis an integer
According to Knuth's proof, A = 5 − 1 2 = 0.6180339887... A=\frac{\sqrt{5}-1}{2}=0.6180339887... A=25 1=0.6180339887... is an ideal value

global hashing

Formula:
hab ( k ) = ( ( ak + b ) mod p ) mod m where a ∈ { 0 , 1 , 2 , . . . , p − 1 } b ∈ { 1 , 2 , 3 , . . . , p − 1 } Note: p is a sufficiently large prime number, for any k there is 0 ⩽ k ⩽ q − 1 h_{ab}(k)=((ak+b)mod\ p)mod \ m\\where a \in \{ {0,1,2,...,p-1}\}\quad b\in\{ { 1,2,3,...,p-1}\}\\ Note: p For a sufficiently large prime number, there is 0\leqslant k\leqslant q-1 for any khab(k)=(( and+b ) m o d p ) m o d m  where a{ 0,1,2,...,p1}b{ 1,2,3,...,p1}Note: p is a prime number large enough to have 0 for any kkq1
The global hash method is to obtain a hash function class, which containsp ( p − 1 ) p(p-1)p(p1 ) This kind of hash function
has been proved that it can effectively hash any k value into m storage spaces with equal probability, and has a more stable uniform property

2. Two options for resolving conflicts

(1) Link method

The link method is to turn m storage spaces into a linked list, and then insert the conflicting k value into the corresponding linked list in the form of a linked list node.
Essentially a linked list array structure
Notice:

双向链表更加利于元素的删除

(2) Open addressing method

The open addressing method is actually to put all the elements into a hash table.
Its advantage is that it reduces conflicts, does not require additional storage space for pointers, and improves retrieval efficiency
Its disadvantage is that the hash function is more complicated to resolve conflicts, and it will be troublesome if elements need to be deleted
Its difficulty lies in the selection of the hash function:
Note:

三种散列函数都不能良好地满足均匀散列的要求,只能说双重散列的小姑哦接近最好

linear probing

Formula:
First take an ordinary hash function h , , which is called an auxiliary hash function h ( k , i ) = ( h , ( k ) + i ) mod m , i = 0 , 1 , . . . , m − 1 i is the number of the storage space. First take an ordinary hash function h^, which is called an auxiliary hash function\\ There is h(k,i)=(h^,(k)+i)mod\ m, \quad i=0,1,...,m-1\\ i is the number of storage spaceFirst take an ordinary hash function h,,auxiliary hash functionhave h ( k ,i)=(h,(k)+i)mod m,i=0,1,...,m1i is the number of the storage space.
This method is to first detect the spaceT [ h , ( k ) ] T[h^,(k)]T[h, (k)], and then probe down in sequence.
This simple method is particularly easy to cause k elements to be concentrated in a storage range, making the detection more and more time-consuming

Secondary inspection

Formula:
Still choose an auxiliary hash function h , h ( k , i ) = ( h , ( k ) + c 1 i + c 2 i 2 ) mod m , i = 0 , 1 , . . . , m − 1 i is the number of the storage space, c 1 , c 2 are positive integers, still select an auxiliary hash function h^,\\ h(k,i)=(h^,(k)+c_1i+c_2i^2)mod\ m,\quad i=0,1,...,m-1\\ i is the number of storage space, c_1, c_2 are positive integersStill choose an auxiliary hash function h,h(k,i)=(h,(k)+c1i+c2i2 )modm, i=0,1,...,m1i is the number of the storage space ,c1,c2is a positive integer
This method adds one more offset than linear probing, but it has been proven that it will not avoid the concentration of elements in the later stage, and will cause slight clustering

double hashing

Formula:
choose auxiliary hash function h 1 , h 2 h ( k , i ) = ( h 1 ( k ) + ih 2 ( k ) ) mod m , i = 0 , 1 , . . . , m − 1 i is Number selection of storage space auxiliary hash function h_1,h_2\\ h(k,i)=(h_1(k)+ih_2(k))mod\ m,\quad i=0,1,...,m- 1\\ i is the number of the storage spaceChoose an auxiliary hash function h1,h2h(k,i)=(h1(k)+ih2( k ) ) m by m , i=0,1,...,m1i is the number of the storage space
In order to search the entire hash table, the following conditions must be met:
Take m as a prime number and design a h 2 that always returns less than m Take m as a prime number \\ and design a h_2 that always returns less than mtake m as a prime numberAnd design a h that always returns less than m2

3. C language implementation of hash table

The key to the hash table is a good hash function, which is not complicated in programming logic , so:
here only implemented in C languagerelatively simple hash table.
Used:

1.链接发解决冲突
2.除法散列法作为散列函数
#include<stdio.h>
#include<stdlib.h>
#include<time.h>



//要存的数据的个数 
#define SIZE 10
//得到散列表大小
//要求是m较小于2的整数幂的素数
//同时我们接受平均每个链表中有三个元素
//因此我们选择了此值 
#define M 10



//设计链表的数据结构 
typedef struct list
{
    
    
	//值 
	int key;
	//指向下一个结点的指针 
	list* p;
}list; 



//除法散列函数 
int HASH(int k)
{
    
    
	int n=0;
	n=k%M;
	return n;
} 



//获取链表的长度
//该长度也可以称为值的数量 
int list_length(list *a,int i)
{
    
    
	int length=0;//声明长度变量 
	//链表循环变量 
	list *temp;
	temp=a[i].p;
	//如果temp没有指向的空间,那么结束循环 
	while(temp!=NULL)
	{
    
    
		printf("success\n");
		length++;//一次循环代表有一个结点空间 
		temp=temp->p;//将循环变量指向下一个结点 
	}
	return length;//返回长度 
}



//释放动态分配的内存 
void destroy_list(list *a,int n)
{
    
    
	list *temp1;
	list *temp2;
	temp1=a[n].p;
	while(temp1!=NULL)
	{
    
    
		temp2=temp1;
		temp1=temp1->p;
		free(temp2);
	}
}



//插入值到链表
//该插入函数是将新值永远插入第一个位置 
void list_insert(list *a,int n,int num)
{
    
    
	list* temp;
	//如果不动态分配内存,那么内存会被系统在程序运行到某个结点被回收
	//导致无法存储值到程序结束 
	temp=(list *)malloc(sizeof(list));//动态分配内存 
	temp->key=num;
	//如果链表是空的,那么直接插入到头节点 
	if((a[n].p)==NULL)
	{
    
    
		temp->p=NULL;
		a[n].p=temp;
	}
	//如果链表不空,那么需要插入到头节点后,还要与原头结点连接 
	else
	{
    
    
		temp->p=a[n].p;
		a[n].p=temp;
	}
}



//链表信息打印 
void list_print(list *a,int n)
{
    
    
	list* temp;
	temp=(list *)malloc(sizeof(list));
	temp->p=a[n].p;
	printf("%d  |||",list_length(a,n));//打印大小
	//大小之后打印具体的值 
	while(temp->p!=NULL)
	{
    
    
		printf("%5d",temp->p->key);
		temp->p=temp->p->p;
	}
	free(temp);
	printf("\n");
}



//插入k值到散列表 
void LIST_HASH_INSERT(list *a,int k)
{
    
    
	int n=0;
	n=HASH(k);
	printf("%5d",n);
	list_insert(a,n,k);	
} 



//搜索k值的位置
int LIST_HASH_SEARCH(list *a,int k)
{
    
    
	int n=0;
	int j=0;
	n=HASH(k);
	list *temp;
	temp=a[n].p;
	while(temp!=NULL)
	{
    
    
		if(temp->key==k)
		{
    
    
			j++;
			return j; 
		}
		temp=temp->p;
	}
	if(j==0)
	{
    
    
		return 0;
	}
}



//打印位置信息函数
void HASH_PRINT(int n,int k)
{
    
    
	int num=0;
	num=HASH(k);
	if(n>0)
	{
    
    
		printf("k is the %dth data in %dth list\n",n,num);
	}
	else
	{
    
    
		printf("There is no k inside of LIST_HASH\n");
	}
} 

 

int main()
{
    
    
	int a[SIZE];
	list b[M];
	srand((unsigned)time(NULL));
	int ID;
	int i=0;
	//得到随机数组 
	for(i=0;i<SIZE;i++)
	{
    
    
		a[i]=rand()%100;
	}
	for(i=0;i<SIZE;i++)
	{
    
    
		printf("%5d",a[i]);
	}
	printf("\n");
	//将数据插入HASH
	for(i=0;i<SIZE;i++)
	{
    
    
	 	LIST_HASH_INSERT(b,a[i]);
	}
	//打印出HASH_LIST
	for(i=0;i<M;i++)
	{
    
    
		list_print(b,i);
	} 	
	//搜索一个在散列表中的数据 
	ID=LIST_HASH_SEARCH(b,250);
	HASH_PRINT(ID,250);		
	//搜索一个不在散列表中的数据 
	ID=LIST_HASH_SEARCH(b,5200);
	HASH_PRINT(ID,5200);
	//销毁链表
	for(i=0;i<M;i++)
	{
    
    
		destroy_list(b,i);		
	}			
	return 0;
}

Summarize

For the errors and inadequacies of the article, readers please bear with me and correct me.
For knowledge about linked lists, please refer to the article:
"Introduction to Algorithms" Learning (12)----Linked List (C Language)

Guess you like

Origin blog.csdn.net/weixin_52042488/article/details/126884051