Implementation of union lookup and hash table

insert image description here

Article directory

Implementation of union lookup and hash table
Hash table (hash)

1. The function of union search

1.将两个集合进行合并

2.询问两个元素是否在一个集合里面

2. The basic principle of union search

基本原理：每一个集合用一棵树来表示，树根的编号就是整个集合的编号，每一个点存储的都是其父节点，p[x]表示的就是x的父节点

There are three issues to consider
问题一：如何判断树根 if(p[x]==x)

问题二：如何求x的集合编号 while(p[x]!=x) x=p[x]

问题三：如何合并两个结合 p[x]是x的集合编号，p[y]是y的集合编号,p[x]=y;表示x的父节点是y

3. Implementation of union search

And search set, find two elements, whether they are in a set, and can merge two sets

//首先要设置全员变量
using namespace std;

const int n = 100010;
int n,m;
int p[N];//p[N]来存储的是父节点

//寻找操作
int find (int x)
{
    
    
    if(p[x]!=x) p[x]=find (p[x]);
	return p[x];//如果x的父节点不是x的时候，就要将p[x]放在find函数里面去，用p[x]来接收，直到最后p[x]==x ，递归返回之后，这个结合的所有结点的父节点都是x
}
//相当于 返回x的祖宗结点，并且有路径压缩

//如果说我们输入”M“来表示将两个集合合并
int main()
{
    
    
    scanf("%d%d",&n,&m);
    //输入n个数字，进行m次操作
    for(int i = 1;i <= n; i++ ){
    
    
        p[i]=i;//首先让n个结点的父节点都是自己本身，因为初始的时候每一个结点就是一个集合
    }
    
    while(m--){
    
    
        char op[2];//输入操作的字符
        int a,b;//指定的两个结点
        scanf("%s%d%d",op,&a,&b);
        
		if(op[0]=='M'){
    
    
          p[find(a)]=find(b);//将两个集合合并，可以先找到a，b的祖宗结点，这样讲a祖宗结点的父节点设置为b的祖宗结点，这样两个集合合并
        }
        else {
    
    //op不是M的时候，就是要进行判定ab是否是在一个集合里面
            //如果是I 表示判断是否两个结点在一个集合里面
            if(find(a)==find(b)){
    
    
                printf("Yes\n");
            }else {
    
    
                printf("NO\n");
            }
        }
    }
    return 0;
}

Hash table (hash)

There are two commonly used ways to store data in hash tables, zipper method and open addressing method

The principle of the hash table is to store a large range of numbers in a small range of arrays.

The functions that can be realized are 1. storage 2. search

For example, -10 ⁹ ~10 ⁹ is stored in an array with a range of 0~10 ^{5 and stored}

Operating procedures

1. First perform modulo, get a number and store it in an array

2. Because there will be conflicts for these data, so for multiple data modulo to get a value, using the zipper method, each slot of the array can correspond to a structure in the form of a linked list, through this structure, the modulus will be the same The elements of the slot are placed in the linked list corresponding to this slot, and then the operation of O(1) is approximately realized during the search process

The number we use when we mod should be a prime number. If we are dealing with data in the range of 0-100000, then we need to find the smallest prime number greater than 100000

The method to find the mod value is :

int getInteger(int n){
    
    //n传递为100000
	while(n++){
    
    
        for(int j=2;j*j<=n;j++){
    
    
            if(n%j==0){
    
    
                break;
            }
        }
    }
    return n;
}

1. Zipper method

操作的流程,在c/c++中需要自行用数组来表示链表,在Java中可以使用LinkedList来表示链表

我们使用c++来举例

using namespace std;

const int n=100003;//我们提前找到了这个大于最大范围的最小质数
int h[N],e[N],ne[N],idx;
//h[N]用来存储模的数值
//e[N]用来形成链表,存储的数值
//ne[N]用来得到当前N下标对应的下一个数值
//idx表示链表的下标

//我们要实现的是,输入一个n,表示有n个操作,如果是输入"I x"表示插入数值x,如果是"Q x"实现查找x,查找到x 输出Yes 没有输出No
#include<string.h>
#include<stdio.h>
#include<iostream>

void insert(int x){
    
    
    int k=(x % N + N ) % N;//取模,x%N可能为负值,所以+N,然后再mod N
    //得到k就是要存储在槽中的数值h[N],就是找到位置
    e[idx]=x;//用当前下标idx存储数值为x
    ne[idx]=h[k];//然后idx的下一个坐标为h[k]的位置也就是说,头插
    h[k]=idx;//然后h[k]=idx h[k]就相当于头节点
    idx++;//添加一个元素idx++;
    
}

bool find(int x){
    
    
    int k=( x % N + N ) % N ;
	for(int i=h[k];i!=-1;i=ne[i]){
    
    
        //得到x的存储在槽终点 位置h[k],然后查询链表
        if(x==e[i]){
    
    
            return true;
        }
    }
    return false;
}

int main()
{
    
    
    int n;
    scanf("%d",&n);
    //表示输入n个操作
    memset(h,-1,sizeof(h));//对于h[N]数组进行赋值为-1
    
    while(n--){
    
    
        char op[2];
        int x;
        scanf("%s%d",op,&x);
        if(op=='I'){
    
    //插入存储数值x
            insert(x);
        }else {
    
    
            //进行查找x
            if(find(x)){
    
    
                printf("Yes\N");
            }else{
    
    
                printf("No\n");
            }
        }
    }
    return 0;
}

The main content of the zip method is

1.得到k的数值,然后在h[k]数组上加上链表,使用e[idx]数组来存储数值x,用ne[idx]=h[k],h[k]=idx来实现头插,最后idx++;

//插入的算法
void insert(int x){
    
    
    int k=(x % N + N) % N;
    e[idx]==x;
    ne[idx]=h[k];
    h[k]=idx;
    idx++;
}

2.我们对于h[N]数组初始化的时候赋值为-1,但是在插入数据的时候,会头插,将idx的数值赋值给了h[N]数组

//查询的时候的代码
bool find(int x){
    
    
    int k=(x % N + N) % N;    
    for(int i=h[k];i!=-1;i=ne[i]){
    
    
        if(e[i]==x){
    
    
            return true;
        }
    }
    return false;
}

拉链法主要是插入函数insert和find函数.

2. Open addressing method

开放寻址法的主要方式为: 开辟的数组的发小是输入数据范围的2,3倍

Method flow

1.先找到一个合适的质数

2.插入方法为,在存储数据的时候通过hash函数,得到k,然后存储

3.进行查找

code demo

using namespace std;

const int N=100003;
int null=0x3f3f3f3f;//设定一个极大值
int h[N];//存储数值的数组

int find(int x){
    
    
    int k=(x % N + N) % N;
    while(h[k]!=null && h[k]!=x){
    
    
        //如果说是不为null不为x,表示没有存储过这个数值
        k++;//没找到就k++
        if(k==N)  k=0;//如果等于N,那就从头0开始
    }
    return k;//找到了就返回的k值
}

int main()
{
    
    
    int n;//表示n个操作数
    scanf("%d",&n);
    memset(h,0x3f,sizeof(h));
    //memset 作用是对于一个地址进行赋值,可以指定大小,指定数值
    //函数有三个参数
    //第一个是 输入地址,比如h表示数组地址,第二个参数是存放数值,这个是存放一个字节的数值,如果是int 数组,那就是4个0x3f
    //第三个参数是数组大小,也就是要赋值的范围
 
    while(n--){
    
    
        char op[2];
        int x;
        scanf("%s%d",op,&x);
     	int k= find(x);
        if(op[0]=='I'){
    
    
            h[k]=x;//进行插入数值x
        }else{
    
    
            //判断是否有x在数组里面
            if(h[k]!=null) printf("Yes\n");
            else printf("No\n");
        }
    }
    return 0;
}

3, string hash

类似于kmp 但是可以实现查找字符串到O(1),基本上更好

The usage steps of string hash are:

1. Use the p base system, similar to the decimal system, use h[0], h[1]... to store the p count of the first n strings

2. To make a string of characters, get a number in p-base, with an experience value of 131 or 13331; if it is abcd, it is (a *p[3]+b*p[2]+c*p[1] +d*p[0]) % QQ also has an experience value of 2 ⁶⁴ because the h array is used to store the hash value, 2 ⁶⁴ can be represented by unsigned long long, if it is greater than 2 ⁶⁴ , it will exceed the storage range, and the automatic modulus value

That is directly unsigned long long h[n]=a*p[3]+b*p[2]+c*p[1]+d*p[0]

3. Then if you want to get the string hash between l and r, the front end is the high bit, and the back end is the low bit. If r>1, then let h [ l-1 ] *p[ r- l +1 ] then h[r]-h[l-1]*p[r-l+1]

code show as below

using namespace std;

typedef unsigned long long ULL;
const int N=100010,p=131;

int n,m;
char str[N];
ULL h[N],p[N];

//题目要求  第一行输入n m str[N]  表示为 n为字符串长度 m为比较次数 str为字符串数组
//接下来m行  输入l1 r1 l2 r2  比较 l1到r1 l2到r2  这两段字符串是否相等

ULL get(int l,int r){
    
    
    return h[r]-h[l-1]*p1[r-l+1];
}
int main()
{
    
    
	scanf("%d%d%s", &n, &m, str + 1);
	//str数组从1开始 
	//
	p[0] = 1;
	//得到数字
	for (int i = 1; i <= n; i++)
	{
    
    
		p[i] = p1[i - 1] * p;
		h[i] = h[i - 1] * p + str[i];
	}

	while (m--)
	{
    
    
		int l1, r1, l2, r2;
		scanf("%d%d%d%d", &l1, &r1, &l2, &r2);

		if (get(l1, r1) == get(l2, r2))  						printf("Yes\n");
		else printf("No\n");
	}
	return 0;
}