在这里插入图片描述

Hash表学习目录

1. 无序系列关联式容器

1. unordered_map
2. unordered_set

2. 哈希表

1. 哈希函数
2. 哈希冲突

3. 模拟实现

1. 模板参数列表
2. 迭代器，哈希表的实现
3. 无序的map实现
4. 无序的 set 实现

4. 应用

1. 位图
2. 布隆过滤器
3. 哈希切割

1. 无序系列关联式容器

1. unordered_map

是存储<key，value>键值对的关联式容器，其允许通过keys快速索引到其对应的value。
通过key访问单个元素的效率要比map快，但它通常在遍历元素子集的范围迭代方面效率却比较低。
它允许使用key作为参数，直接访问value，即operator[]
其接口函数和map的接口函数相同

#include <unordered_map>

	unordered_map<int, int> um;
	um.insert(make_pair(1, 1));
	
	//operator[]: 插入
	um[100] = 100;
	//operator[]: 修改
	um[15] = 15;
	
	//迭代器遍历无序
	cout << "unordered_map: " << endl;
	unordered_map<int, int>::iterator uit = um.begin();
	while (uit != um.end())
	{
		cout << uit->first << "-->" << uit->second << endl;
		++uit;
	}
	uit = um.find(100);
	cnt = um.count(20);

2. unordered_set

和unordered_map相比较，其中元素的值同时也是它的key。其接口函数和unordered_map相同

#include <unordered_set>
	unordered_set<int> us;
	us.insert(10);
	
	unordered_set<int>::iterator uit = us.begin();
	while (uit != us.end())
	{
		cout << *uit << " ";
		++uit;
	}

在这里插入图片描述

2. 哈希表

unordered系列的关联式容器之所以效率比较高，是因为其底层使用了哈希结构。
不经过任何比较，一次直接从表中得到要搜索的元素。如果构造一种存储结构，通过某种函数(hashFunc)使元素的存储位置与它的关键码之间能够建立一一映射的关系，那么在查找时通过该函数可以很快找到该元素，这种方法称之为哈希散列方法，哈希方法中使用的转换函数称为哈希散列函数，构造出来的结构称为哈希表。

1. 哈希函数

在这里插入图片描述

2. 哈希冲突

使用哈希函数之后，会造成哈希冲突，而为了能够更好的解决掉哈希冲突，也是有着闭散列和开散列的两种常见方法！
在这里插入图片描述

闭散列——线性探测，二次探测

线性探测

为了更好的检测哈希表中的容量，我们设定了装载因子（填入表中的元素个数/散列表的长度），若超过装载因子，则进行扩容。
二次探测（每次偏移的位置的长度为上一次的平方）

开散列（开链法）
开散列法又叫链地址法(开链法)，首先对关键码集合用散列函数计算散列地址，具有相同地址的关键码归于同一子集合，每一个子集合称为一个桶，各个桶中的元素通过一个单链表链接起来，各链表的头结点存储在哈希表中。相对于闭散列的话，使用链地址法比开地址发要节省存储空间。

与闭散列相似的是，在存储的元素满了之后都需要进行扩容，而与闭散列不同的是，闭散列是达到装载因子的时候进行扩容，而开散列则是在元素个数等于它的桶个数，则进行扩容。

开散列的遍历：

template <class K>
struct keyOfValue
{
	const K& operator()(const K& key)
	{
		return key;
	}
};


//开散列： 指针数组 + 单链表

template <class V>
struct HashNode
{
	V _value;
	HashNode<V>* _next;

	HashNode(const V& val = V())
		:_value(val)
		, _next(nullptr)
	{}
};

//前置声明
template <class K, class V, class KeyOfValue, class HF>
class HashTable;

template <class K, class V, class KeyOfValue, class HF>
struct HashIterator
{
	typedef HashNode<V> Node;
	typedef HashIterator<K, V, KeyOfValue, HF> Self;

	typedef HashTable<K, V, KeyOfValue, HF> HT;

	Node* _node;
	HT* _ht;

	HashIterator(Node* node, HT* ht)
		:_node(node)
		, _ht(ht)
	{}

	V& operator*()
	{
		return _node->_value;
	}

	V* operator->()
	{
		return &_node->_value;
	}

	bool operator!=(const Self& it)
	{
		return _node != it._node;
	}

	Self& operator++()
	{
		if (_node->_next)
			_node = _node->_next;
		else
		{
			//找到下一个非空链表的头结点
			// 1. 定位当前节点在哈希表中的位置
			//kov: 获取value的key
			KeyOfValue kov;
			//hf: 获取key转换之后的整数
			HF hf;
			size_t idx = hf(kov(_node->_value)) % _ht->_table.size();

			// 2. 从表中的下一个位置开始查找第一个非空链表的头结点
			++idx;
			Node* cur = _ht->_table[idx];

			for (; idx < _ht->_table.size(); ++idx)
			{
				if (_ht->_table[idx])
				{
					_node = _ht->_table[idx];
					break;
				}
			}
			// 3. 判断是否找到一个非空的头结点
			if (idx == _ht->_table.size())
				_node = nullptr;
		}
		return *this;
	}
};

在这里插入图片描述

素数表
对于扩容的话，我们尽可能地使用一个所存在的素数，这样也是更好的避免了一系列的问题。

const int PRIMECOUNT = 28;
const size_t primeList[PRIMECOUNT] =
{
53ul, 97ul, 193ul, 389ul, 769ul,
1543ul, 3079ul, 6151ul, 12289ul, 24593ul,
49157ul, 98317ul, 196613ul, 393241ul, 786433ul,
1572869ul, 3145739ul, 6291469ul, 12582917ul, 25165843ul,
50331653ul, 100663319ul, 201326611ul, 402653189ul, 805306457ul,
1610612741ul, 3221225473ul, 4294967291ul
};
size_t GetNextPrime(size_t prime)
{
size_t i = 0;
for(; i < PRIMECOUNT; ++i)
{
if(primeList[i] > primeList[i])
return primeList[i];
}
return primeList[i];
}

字符转换为整数
若是我们想要将字符或者字符串等元素存储进哈希表，则需要进行字符转换

struct strToInt
{
	size_t operator()(const string& str)
	{
		size_t hash = 0;
		for (const auto& ch : str)
		{
			hash = hash * 131 + ch;
		}
		return hash;
	}
};

在这里插入图片描述

3. 模拟实现

下面所有的代码是一体的，即所有的调用都是共同使用的。

1. 模板参数列表

// K:关键码类型
// V: 不同容器V的类型不同，如果是unordered_map，V代表一个键值对，如果是unordered_set,V为 K
// KeyOfValue: 因为V的类型不同，通过value取key的方式就不同，
// HF: 哈希函数仿函数对象类型，哈希函数使用除留余数法，需要将Key转换为整形数字才能取模
template <class K, class V, class KeyOfValue, class HF>
class HashTable;

template <class K>
struct keyOfValue
{
	const K& operator()(const K& key)
	{
		return key;
	}
};

template <class V>
struct HashNode
{
	V _value;
	HashNode<V>* _next;

	HashNode(const V& val = V())
		:_value(val)
		, _next(nullptr)
	{}
};

2. 迭代器，哈希表的实现

在这里插入图片描述

template <class K, class V, class KeyOfValue, class HF>
class HashTable
{
public:

	//迭代器声明为友元类
	template <class K, class V, class KeyOfValue, class HF>
	friend struct HashIterator;

	typedef HashNode<V> Node;
	typedef HashIterator<K, V, KeyOfValue, HF> iterator;

	iterator begin()
	{
		//第一个非空链表的头结点
		for (size_t i = 0; i < _table.size(); ++i)
		{
			Node* cur = _table[i];
			if (cur)
				return iterator(cur, this);
		}
		return iterator(nullptr, this);
	}

	iterator end()
	{
		return iterator(nullptr, this);
	}

	pair<iterator,bool> insert(const V& value)
	{
		checkCapacity();

		//1. 计算位置
		KeyOfValue kov;
		HF hf;
		int idx = hf(kov(value)) % _table.size();

		//2. 搜索key是否已经存在
		Node* cur = _table[idx];
		while (cur)
		{
			if (kov(cur->_value) == kov(value))
				//return false;
				return make_pair(iterator(cur, this), false);
			cur = cur->_next;
		}

		//3. 插入: 头插
		cur = new Node(value);

		cur->_next = _table[idx];
		_table[idx] = cur;

		++_size;
		//return true;
		return make_pair(iterator(cur, this), true);
	}

	size_t getNextSize(size_t n)
	{
		const int PRIMECOUNT = 28;
		const size_t primeList[PRIMECOUNT] =
		{
			53ul, 97ul, 193ul, 389ul, 769ul,
			1543ul, 3079ul, 6151ul, 12289ul, 24593ul,
			49157ul, 98317ul, 196613ul, 393241ul, 786433ul,
			1572869ul, 3145739ul, 6291469ul, 12582917ul, 25165843ul,
			50331653ul, 100663319ul, 201326611ul, 402653189ul, 805306457ul,
			1610612741ul, 3221225473ul, 4294967291ul
		};

		for (int i = 0; i < PRIMECOUNT; ++i)
		{
			if (primeList[i] > n)
				return primeList[i];
		}

		return primeList[PRIMECOUNT];
	}

	void checkCapacity()
	{
		if (_size == _table.size())
		{
			//size_t newSize = _size == 0 ? 5 : 2 * _size;
			size_t newSize = getNextSize(_size);
			vector<Node*> newHt;
			newHt.resize(newSize);
			KeyOfValue kov;
			HF hf;
			//遍历旧表中的非空单链表
			for (size_t i = 0; i < _table.size(); ++i)
			{
				Node* cur = _table[i];
				//遍历当前单链表
				while (cur)
				{
					//0. 记录旧表中的下一个元素
					Node* next = cur->_next;

					//1. 计算新的位置
					int idx = hf(kov(cur->_value)) % newHt.size();

					//2. 头插
					cur->_next = newHt[idx];
					newHt[idx] = cur;

					// 3. 处理下一个元素
					cur = next;
				}
				_table[i] = nullptr;
			}

			_table.swap(newHt);
		}
	}

	Node* find(const K& key)
	{
		if (_table.size() == 0)
			return nullptr;

		HF hf;

		int idx = hf(key) % _table.size();

		Node* cur = _table[idx];
		KeyOfValue kov;
		while (cur)
		{
			if (kov(cur->_value) == key)
				return cur;
			cur = cur->_next;
		}
		return nullptr;
	}

	bool erase(const K& key)
	{
		HF hf;
		int idx = hf(key) % _table.size();
		Node* cur = _table[idx];
		//单链表删除逻辑
		Node* prev = nullptr;

		KeyOfValue kov;
		while (cur)
		{
			if (kov(cur->_value) == key)
			{
				//删除
				//判断删除的是否为头结点
				if (prev == nullptr)
				{
					_table[idx] = cur->_next;
				}
				else
				{
					prev->_next = cur->_next;
				}

				delete cur;
				--_size;
				return true;
			}
			else
			{
				prev = cur;
				cur = cur->_next;
			}
		}
		return false;
	}

private:
	vector<Node*> _table;
	size_t _size = 0;
};

3. 无序的map实现

unordered_map中存储的是pair<K, V>的键值对，K为key的类型，V为value的类型，HF哈希函数类型
template <class K>
struct hashFun
{
	size_t operator()(const K& key)
	{
		return key;
	}
};


template <class K, class V, class HF = hashFun<K>>
class UnorderedMap
{
	struct MapKeyOfValue
	{
		const K& operator()(const pair<K, V>& value)
		{
			return value.first;
		}
	};
public:
	bool insert(const pair<K, V>& value)
	{
		return _ht.insert(value);
	}
private:
	HashTable<K, pair<K, V>, MapKeyOfValue, HF> _ht;
};

4. 无序的 set 实现

template <class K>
struct hashFun
{
	size_t operator()(const K& key)
	{
		return key;
	}
};

template <class K, class HF = hashFun<K>>
class UnorderedSet
{
	struct SetKeyOfValue
	{
		const K& operator()(const K& value)
		{
			return value;
		}
	};
public:
	typedef typename HashTable<K, K, SetKeyOfValue, HF>::iterator iterator;

	iterator begin()
	{
		return _ht.begin();
	}

	iterator end()
	{
		return _ht.end();
	}

	bool insert(const K& value)
	{
		return _ht.insert(value);
	}
private:
	HashTable<K, K, SetKeyOfValue, HF> _ht;
};

4. 应用

1. 位图

位图的原理
位图的实现

class BitMap
{
public:
	BitMap(size_t range)
	{
		_bit.resize(range / 32 + 1);
	}

	//查询：Test
	bool Test(size_t n)
	{
		//整数位置
		int idx = n / 32;
		int bitIdx = n % 32;
		//获取对应bit位二进制值
		if ((_bit[idx] >> bitIdx) & 1)
			return true;
		else
			return false;
	}
	//存储: Set
	void Set(size_t n)
	{
		int idx = n / 32;
		int bitIdx = n % 32;

		_bit[idx] |= (1 << bitIdx);

	}

	//删除: Reset
	void Reset(size_t n)
	{
		int idx = n / 32;
		int bitIdx = n % 32;

		_bit[idx] &= ~(1 << bitIdx);
	}
private:
	vector<int> _bit;
};

2. 布隆过滤器

将哈希表和位图两者的功能进行结合，即布隆过滤器，不仅不会造成空间浪费，也可以更好的处理哈希冲突。

一个给定的内容，分成三份存储。

布隆过滤器如果说某个元素不存在时，一定不存在，若是该元素存在时也可能不存在，因为哈希函数存在一定的误判。
布隆过滤器不能够直接删除，因为删除一个元素，可能回影响其他元素。

布隆过滤器的实现

template <class T, class HF1, class HF2, class HF3>
class BloomFilter
{
public:
	// bit位数量 = 哈希函数个数 * 数据量 / ln2
	BloomFilter(size_t num)
		:_bit(5 * num)
		, _bitCount(5 * num)
	{}

	// Set
	void Set(const T& value)
	{
		HF1 hf1;
		HF2 hf2;
		HF3 hf3;
		//计算哈希值
		size_t hashCode1 = hf1(value);
		size_t hashCode2 = hf2(value);
		size_t hashCode3 = hf3(value);

		_bit.Set(hashCode1 % _bitCount);
		_bit.Set(hashCode2 % _bitCount);
		_bit.Set(hashCode3 % _bitCount);
	}

	// Test即查找：分别计算每个哈希值对应的位置存储是否为零，只有要一个为零，则表示不存在
	bool Test(const T& value)
	{
		HF1 hf1;
		size_t hashCode1 = hf1(value);
		if (!_bit.Test(hashCode1 % _bitCount))
			return false;
		HF2 hf2;
		size_t hashCode2 = hf2(value);
		if (!_bit.Test(hashCode2 % _bitCount))
			return false;
		HF3 hf3;
		size_t hashCode3 = hf3(value);
		if (!_bit.Test(hashCode3 % _bitCount))
			return false;

		//返回true: 不一定正确
		return true;
	}

private:
	BitMap _bit;
	size_t _bitCount;

};

struct strToInt1
{
	size_t operator()(const string& str)
	{
		size_t hash = 0;
		for (auto& ch : str)
		{
			hash = hash * 131 + ch;
		}
		return hash;
	}
};

struct strToInt2
{
	size_t operator()(const string& str)
	{
		size_t hash = 0;
		for (auto& ch : str)
		{
			hash = hash * 65599 + ch;
		}
		return hash;
	}
};

struct strToInt3
{
	size_t operator()(const string& str)
	{
		size_t hash = 0;
		size_t magic = 63689;
		for (auto& ch : str)
		{
			hash = hash * magic + ch;
			magic *= 378551;
		}
		return hash;
	}
};

3. 哈希切割

给一个超过100G大小的log file, log中存着IP地址, 设计算法找到出现次数最多的IP地址？与上题条件相同，
如何找到top K的IP？如何直接用Linux系统命令实现？
在这里插入图片描述

[C++系列]哈希表到底是什么？一文总结哈希表闭开散列，位图及布隆过滤器应用

Hash表学习目录

1. 无序系列关联式容器

1. unordered_map

2. unordered_set

2. 哈希表

1. 哈希函数

2. 哈希冲突

3. 模拟实现

1. 模板参数列表

2. 迭代器，哈希表的实现

3. 无序的map实现

4. 无序的 set 实现

4. 应用

1. 位图

2. 布隆过滤器

3. 哈希切割

猜你喜欢