散列

示例问题：
查询N个正整数是否在M个正整数中出现过
传统遍历法的复杂度为O(NM)
采用散列思想，用空间换时间，复杂度为O(N+M)


#include <cstdio>
#include <vector>
#include <algorithm>
#include <string>
#include <cstring>
#include <iostream>
using namespace std;
const int maxn = 10010;
bool hashTable[maxn] = { false };

int main()
{
	int n, m, x;
	scanf("%d%d", &n, &m);
	for (int i = 0; i < n; i++)
	{
		scanf("%d", &x);
		hashTable[x] = true;//数字x出现过（N序列）
	}
	for (int i = 0; i < m; i++)
	{
		scanf("%d", &x);
		if (hashTable[x] == true) {
			printf("YES\n");//如果数字x出现过，则输出YES（待查询M序列）
		}
		else {
			printf("NO\n");
		}
	}
	return 0;
}

输出：

5 3
8 3 7 6 2
7 4 2
YES
NO
YES
请按任意键继续. . .

思想：直接把输入的数作为数组的下标来对这个数的性质进行统计（重要！），
但这样有个限制：如果输入是很大的数（超过10⁵如10⁹，或者甚至是一个字符串，就不能将他们直接作为数组下标了）
散列（hash) 很好地解决了这个问题：
将元素通过一个函数转换为整数，使得该证书可以尽量唯一地代表这个元素。把这个转换函数称为散列函数H，也就是说，如果元素在转换前为key，那么转换后就是一个整数H(key)。

如果key为整数

常用的散列函数 有：直接定址法、平方取中法、除留余数法。
- 直接定址法：恒等变换，即H(key)=key，是最常见最实用的散列应用，或是线性变换，即H(key)=a*key+b；
- 平方取中法： 取key的平方的中间若干位作为hash值（很少用）。
- 除留余数法： 比较实用。把key除以一个数mod得到的余数作为hash值的方法，即 H(key)=key%mod，通过这个散列函数，可以把很大的数转换为不超过mod的整数，这样就可以把它作为可行的数组的下标。**注意：表长TSize必须不小于mod，不然会产生越界。显然，当Mod是一个素数的时候，H(key)能尽可能覆盖[0,mod)范围内的每一个数， 因此一般为了方便起见， 取Tsize是一个素数，而mod直接取成与TSize相等
除留余数法可能导致H(key1)=H(key2)，其中key1!=key2，此时key1已经占据了表中位置为H(key1)的单元，key2不能再使用这个位置，这种情况称作冲突。

解决冲突的三种方法

开放定址法包括线性探查法(Linear Probing)和平方探查法(Quadratic Probing)，另外还有链地址法(拉链法)。一般可以直接使用标准库模板库中的map来直接使用hash的功能（unordered_map）

线性探查法
hash值不断+1，容易导致扎堆
平方探查法
避免扎堆现象。但也有可能找不到位置
链地址法
不计算新的hash值，而是吧所有H(key)相同的key连接成一条单链表，设定数组Link[mod],其中Link[h]存放H(key)=h的一条单链表

如果key不为整数（字符串或坐标等）

比如对于P=(x,y)，其中0<=x,y<=Range，定义H( P)=x*Range+y，即可实现单一映射。
对于字符串而言
进阶部分在后面，现在先讨论将字符串S映射为一个整数，使整数可以尽可能唯一地代表字符串S。为了方便假设字符串均由A~ Z构成，视为0~25（二十六进制，然后转换为十进制）转换的整数最大为26^len-1
示例：将字符串S转换成整数
（字母皆为大写）

int hashFunc(char S[], int len)
{
	int id = 0;
	for (int i = 0; i < len; i++)
	{
		id = id * 26 + (S[i] - 'A');
	}
	return id;
}

示例：字母有大小写，则延长至52进制，数字同理（62进制）。但数字若确定位置可直接用原本值表示

int hashFunc(char S[], int len)
{
	int id = 0;

	for (int i = 0; i < len; i++)
	{
		if (S[i] <= 'Z'&&S[i] >= 'A')
		{
			id = id * 52 + (S[i] - 'A');
		}
		if (S[i] <= 'z'&&S[i] >= 'a')
		{
			id = id * 52 + (S[i] - 'a'+26);//注意这里不是-'A'
		}
	
	}
	return id;
}

示例问题
给出N个字符串（由恰好三位大写字母组成），再给出M个查询字符串，问每个查询字符串在N个字符串中出现的次数。

#include <cstdio>
#include <vector>
#include <algorithm>
#include <string>
#include <cstring>
#include <iostream>
using namespace std;
const int maxn = 100;

int hashFunc(char S[], int len)
{
	int id = 0;
	for (int i = 0; i < len; i++)
	{
		id = id * 26 + (S[i] - 'A');
	}
	return id;
}

int main()
{
	int N, M,len,x;
	char Sn[maxn][5],Sm[maxn][5];
	int hashTable[26 * 26 * 26 + 10] = {};
	scanf("%d%d", &N, &M);
	for (int i = 0; i < N; i++)
	{
		scanf("%s", &Sn[i]);
		len = strlen(Sn[i]);
		x = hashFunc(Sn[i], len);
		hashTable[x]++;
	}
	for (int i = 0; i < M; i++)
	{
		scanf("%s", &Sm[i]);
		len = strlen(Sm[i]);
		x = hashFunc(Sm[i], len);
		printf("%s,%d\n", Sm[i], hashTable[x]);
	}
	return 0;
}

输入：

5 2
ABC ABD ABC ABD KKK
ABC KKK

输出：

ABC,2
KKK,1
请按任意键继续. . .

习题

谁是你的潜在朋友

题目描述
“臭味相投”——这是我们描述朋友时喜欢用的词汇。两个人是朋友通常意味着他们存在着许多共同的兴趣。然而作为一个宅男，你发现自己与他人相互了解的机会并不太多。幸运的是，你意外得到了一份北大图书馆的图书借阅记录，于是你挑灯熬夜地编程，想从中发现潜在的朋友。
首先你对借阅记录进行了一番整理，把N个读者依次编号为1,2,…,N，把M本书依次编号为1,2,…,M。同时，按照“臭味相投”的原则，和你喜欢读同一本书的人，就是你的潜在朋友。你现在的任务是从这份借阅记录中计算出每个人有几个潜在朋友。
输入
每个案例第一行两个整数N,M，2 <= N ，M<= 200。接下来有N行，第i(i = 1,2,…,N)行每一行有一个数，表示读者i-1最喜欢的图书的编号P(1<=P<=M)
输出
每个案例包括N行，每行一个数，第i行的数表示读者i有几个潜在朋友。如果i和任何人都没有共同喜欢的书，则输出“BeiJu”（即悲剧，^ ^）

#include <cstdio>
#include <vector>
#include <algorithm>
#include <string>
#include <cstring>
#include <iostream>
using namespace std;
const int maxn = 100;

int main()
{
	int n, m,id[500];

	while (scanf("%d%d", &n, &m)!=EOF)
	{
		int S[500] = {};
		for (int i = 0; i < n; i++)
		{
			scanf("%d", &id[i]);
			S[id[i]-1]++;
		}
		for (int i = 0; i < n; i++)
		{
			if (S[id[i]-1]!=1)
				printf("%d\n", S[id[i]-1]-1);
			else
				printf("BeiJu\n");
		}
	}
	return 0;
}

分组统计

题目描述
先输入一组数，然后输入其分组，按照分组统计出现次数并输出，参见样例。
输入
输入第一行表示样例数m，对于每个样例，第一行为数的个数n，接下来两行分别有n个数，
第一行有n个数，第二行的n个数分别对应上一行每个数的分组，n不超过100。
输出
输出m行，格式参见样例，按从小到大排。

我的代码

#include <cstdio>
#include <vector>
#include <algorithm>
#include <string>
#include <cstring>
#include <iostream>
using namespace std;
int S[2000][2000];

int main()
{
	int n,id[2000];
	while (scanf("%d", &n) != EOF)
	{

		int num, temp;
		for (int i = 0; i < n; i++)
		{
			//S[2000][2000] = {0};//前一个为第二行组数，后一个为第一行数字
			memset(S, 0, 2000 * 2000);
			int idmax = 0;
			bool groupid[2000] = { false };
			bool idact[2000] = { false };
			scanf("%d", &num);
			for (int j = 0; j < num; j++)
			{
				scanf("%d", &id[j]);
				idact[id[j]] = true;
				if (id[j] > idmax)
					idmax = id[j];
			}
			for (int j = 0; j < num; j++)
			{
				scanf("%d", &temp);
				groupid[temp] = { true };
				S[temp][id[j]]++;
			}
			for (int j = 0; j < 2000; j++)
			{
				if (groupid[j] == true)
				{
					printf("%d={", j);//
					for (int k = 0; k < idmax+1; k++)
					//for (int k = 0; k < 2000; k++)
					{
						if (idact[k]==true)
						{
							printf("%d=%d", k, S[j][k]);
							if (k !=idmax)
								printf(",");
						}	
					}
					printf("}\n");
				}
			}
		}
	}
	return 0;
}

总是犯的一个错误：数组初始化写在了循环外
另外，关于 数组赋值 一个非常重要的易错点：

//S[2000][2000] = {0};//前一个为第二行组数，后一个为第一行数字
memset(S, 0, 2000 * 2000);

在声明数组之后，第一行并不能成功给S的所有元素赋0值，必须使用memset函数。
如果显示 运行错误，可能原因是数组过大，应该把数组的声明放在最外面（全局）。
如果显示 输出超限，可能原因（经常犯这个错误）是

while (scanf("%d", &n) != EOF)

忘了写EOF。。。
如果显示 答案错误50，可能原因是数组开得不够大或者没有提供多组输入的功能

别人的代码
核心思想：第零行和第零列用来选中该行/列（我是另外开了两个bool数组，我的第零行和第零列没有使用），表示需要输出。

const int maxn = 2010;
int hashTable[maxn][maxn];

int main() {
	int m, n, a[110], t;
	scanf("%d", &m);
	while (m--) {
		scanf("%d", &n);
		for (int i = 0; i<n; i++) {
			scanf("%d ", &a[i]);
			hashTable[0][a[i]] = 1;//表示这一列数据需要输出 
		}
		for (int i = 0; i<n; i++) {
			scanf("%d", &t);
			hashTable[t][a[i]]++;
			if (hashTable[t][0] == 0) hashTable[t][0] = 1;//表示这一列(组)需要输出 
		}
		for (int i = 1; i<maxn; i++) {
			if (hashTable[i][0] == 1) { //说明存在第i组
				printf("%d={", i);
				bool flag = false; //控制逗号 
				for (int j = 1; j<maxn; j++) {
					if (hashTable[0][j] == 1) {//说明这一列需要输出
						if (flag) printf(",");//控制逗号 
						else flag = true;
						printf("%d=%d", j, hashTable[i][j]);
					}
				}
				printf("}\n");
			}
		}
		//重置哈希表
		memset(hashTable, 0, sizeof(hashTable));
	}
	return 0;
}

Be Unique (20)

题目描述
Being unique is so important to people on Mars that even their lottery is designed in a unique way. The rule of winning is simple: one bets on a number chosen from [1, 104]. The first one who bets on a unique number wins. For example, if there are 7 people betting on 5 31 5 88 67 88 17, then the second one who bets on 31 wins.
输入
Each input file contains one test case. Each case contains a line which begins with a positive integer N (<=105) and then followed by N bets. The numbers are separated by a space.
输出
For each test case, print the winning number in a line. If there is no winner, print “None” instead.

以下代码运行错误尚未解决！

#include <cstdio>
#include <vector>
#include <algorithm>
#include <string>
#include <cstring>
#include <iostream>
using namespace std;
const int maxn = 10000;
int S[maxn][3] = {};

int compar(const void* a, const void* b)
{
	if (*(const int*)b> *(const int*)a) return -1;
	if (*(const int*)b< *(const int*)a) return +1;
	return 0;
}

int main()
{
	int n,id;
	bool flag;
	while (scanf("%d", &n) != EOF)
	{
		int count = 1, k = 0;
		int temp[maxn][2] = {};
		memset(S, 0, sizeof(S));
		flag = false;
		for (int i = 0; i < n; i++)
		{
			scanf("%d", &id);
			S[id][1]++;
			S[id][0] = count;
			S[id][2] = id;
			count++;
		}
//		sort(S[1], S[1] + 10000);
		qsort(S, sizeof(S)/sizeof(S[0]),sizeof(S[0]), &compar);
		for (int i = 0; i < maxn; i++)
		{
			if (S[i][1] == 1)
			{
				printf("%d\n", S[i][2]);
				flag = true;
				break;
			}
		}
		if (flag == false)
			printf("None\n");
	}
	return 0;
}

String Subtraction (20)

题目描述
Given two strings S1 and S2, S = S1 - S2 is defined to be the remaining string after taking all the characters in S2 from S1. Your task is simply to calculate S1 - S2for any given strings. However, it might not be that simple to do it fast.
输入
Each input file contains one test case. Each case consists of two lines which gives S1 and S2, respectively. The string lengths of both strings are no more than 104. It is guaranteed that all the characters are visible ASCII codes and white space, and a new line character signals the end of a string.
输出
For each test case, print S1 - S2 in one line.

#include <cstdio>
#include <vector>
#include <algorithm>
#include <string>
#include <cstring>
#include <iostream>
using namespace std;

int main()
{
	char S1[10010],S2[10010];
	while (fgets(S1, 10010, stdin) != NULL)
	{
		bool S[130] = { false };
		fgets(S2, 10010, stdin);
		int len2 = strlen(S2);
		for (int i = 0; i < len2; i++)
		{
			//S[(S2[i]-'NULL')] = true;
			int temp;
			temp = (int)S2[i];
			S[temp] = true;
			//S[(int)S2[i]] = true;
		}
		int len1 = strlen(S1);
		for (int i = 0; i < len1; i++)
		{
			if (S[(int)S1[i]] == false)
			{
				printf("%c", S1[i]);
			}
		}
		printf("\n");
	}
	return 0;
}

weixin_42176221

发布了43 篇原创文章 · 获赞 4 · 访问量 1215

私信关注

【数据结构与算法】学习笔记-《算法笔记》-9

散列

如果key为整数

解决冲突的三种方法

如果key不为整数（字符串或坐标等）

习题

以下代码运行错误尚未解决！

猜你喜欢