Implementation of C++ hash table

1. Introduction to unordered series containers

Insert image description here
Insert image description here

2. Introduction to hashing

1. Hash concept

Insert image description here

2. Common designs of hash functions

Insert image description here

3. Hash collision

Insert image description here

4. Design principles of hash functions

Insert image description here

3. Resolve hash conflicts

Two common ways to resolve hash collisions are: closed hashing and open hashing

1. Closed hashing (open addressing method)

Insert image description here
Because linear detection is very similar to secondary detection, only linear detection is implemented here.

1. Linear detection

1. Animation demonstration

Insert image description here
Insert image description here

2. Precautions

Insert image description here

3. Notes on code

1. Problems with functors:
(1). Because the string type cannot perform modulo operations, a functor is added to the string type.
This functor can convert strings into integers, and integers can perform modulo operations,
so this is quite For the second-level mapping string -> int -> subscript (2)
in the hash table , because order issues need to be considered here, such as "abc", "acb" or equal ASCII code values: "aad", "abc" " So many big guys have designed many string hash algorithms and various string Hash functions. If you are interested, you can read the introduction in this blog.





(3) Because string type hash mapping is so commonly used,
template specialization is used here to avoid having to name the hash function passed into the string every time a string is to be stored.

The hash function here only returns an integer value. When calculating the subscript, be sure not to forget to modulo the size of the hash table.
Otherwise, there will be an out-of-bounds error in the vector. If you directly assert, a violent error will be reported.

The hash table is a Key-Value model,
and the hash subscript is calculated based on the Key.

//仿函数
//整型的hash函数
template<class K>
struct HashFunc
{
    
    
	size_t operator()(const K& key)
	{
    
    
		return (size_t)key;
	}
};
//模板特化
//string的哈希函数
template<>
struct HashFunc<string>
{
    
    
	size_t operator()(const string& key)
	{
    
    
		// BKDR字符串哈希函数
		size_t hash = 0;
		for (auto e : key)
		{
    
    
			hash *= 131;
			hash += e;
		}
		return hash;
	}
};

template<class K, class V, class Hash = HashFunc<K>>
class HashTable
....
4. Code implementation
namespace open_address
{
    
    
	enum Status
	{
    
    
		EMPTY,
		EXIST,
		DELETE
	};

	template<class K, class V>
	struct HashData
	{
    
    
		pair<K, V> _kv;
		Status _s;         //状态
	};

	//仿函数
	//整型的hash函数
	template<class K>
	struct HashFunc
	{
    
    
		size_t operator()(const K& key)
		{
    
    
			return (size_t)key;
		}
	};
	//模板特化
	//string的哈希函数
	template<>
	struct HashFunc<string>
	{
    
    
		size_t operator()(const string& key)
		{
    
    
			// BKDR字符串哈希函数
			size_t hash = 0;
			for (auto e : key)
			{
    
    
				hash *= 131;
				hash += e;
			}
			return hash;
		}
	};

	template<class K, class V, class Hash = HashFunc<K>>
	class HashTable
	{
    
    
	public:
		HashTable()
		{
    
    
			_tables.resize(10);
		}

		bool Insert(const pair<K, V>& kv)
		{
    
    
			if (Find(kv.first))
				return false;

			// 负载因子0.7就扩容
			if (_n * 10 / _tables.size() == 7)
			{
    
    
				size_t newSize = _tables.size() * 2;
				HashTable<K, V, Hash> newHT;
				newHT._tables.resize(newSize);
				// 遍历旧表
				for (size_t i = 0; i < _tables.size(); i++)
				{
    
    
					if (_tables[i]._s == EXIST)
					{
    
    
						newHT.Insert(_tables[i]._kv);
					}
				}

				_tables.swap(newHT._tables);
			}

			Hash hf;
			// 线性探测
			size_t hashi = hf(kv.first) % _tables.size();
			while (_tables[hashi]._s == EXIST)
			{
    
    
				hashi++;

				hashi %= _tables.size();
			}

			_tables[hashi]._kv = kv;
			_tables[hashi]._s = EXIST;
			++_n;

			return true;
		}

		HashData<K, V>* Find(const K& key)
		{
    
    
			Hash hf;

			size_t hashi = hf(key) % _tables.size();
			while (_tables[hashi]._s != EMPTY)
			{
    
    
				if (_tables[hashi]._s == EXIST
					&& _tables[hashi]._kv.first == key)
				{
    
    
					return &_tables[hashi];
				}

				hashi++;
				hashi %= _tables.size();
			}

			return NULL;
		}

		// 伪删除法
		bool Erase(const K& key)
		{
    
    
			HashData<K, V>* ret = Find(key);
			if (ret)
			{
    
    
				ret->_s = DELETE;
				--_n;
				return true;
			}
			else
			{
    
    
				return false;
			}
		}

	private:
		vector<HashData<K, V>> _tables;
		size_t _n = 0; // 存储的关键字的个数
	};
}

2. Open hashing (hash bucket, zipper method)

The above closed hashing is not easy to use, so we focus on introducing and implementing the open hashing method

1. Concept

Insert image description here

2. Animation demonstration

Before insertion:
Insert image description here
Insertion process:
Insert image description here
After insertion:
Insert image description here

3. Capacity expansion problem

1.Load factor of zipper method

Insert image description here
Note: Due to the need to remap the subscripts during expansion, the data in one bucket will be scattered into different buckets, making this extreme situation difficult to occur.

2.Description

1. For the hash bucket here, we use a singly linked list.
2. In order to use the open hash bucket to encapsulate unordered_set and unordered_map, we do not use the forward_list in the STL library (new container in C++11: singly linked list). Instead, we tear up the singly linked list by ourselves.
3. Because the singly linked list here is implemented by ourselves, we have to write a destructor and cannot use the destructor generated by the compiler by default.
4. In order to improve efficiency, when the hash table is expanded, we Directly transferring nodes does not involve deep copying of nodes, which is a waste of space.
5. The hash table with open hashing is nothing more than an array of pointers, so don’t be afraid of
AVL trees and red-black trees. We all have them. If it can be realized, what are you afraid of with the hash table...

3. Comparison of open hashing and closed hashing

Insert image description here

4. Implementation of open hash hash table

1. The same part as the closed hash hash table

namespace wzs
{
    
    
	//HashFunc<int>
	template<class K>
	//整型的哈希函数
	struct HashFunc
	{
    
    
		size_t operator()(const K& key)
		{
    
    
			return (size_t)key;
		}
	};
	//HashFunc<string>
	//string的哈希函数
	template<>
	struct HashFunc<string>
	{
    
    
		size_t operator()(const string& key)
		{
    
    
			// BKDR
			size_t hash = 0;
			for (auto e : key)
			{
    
    
				hash *= 131;
				hash += e;
			}
			return hash;
		}
	};

	template<class K, class V>
	struct HashNode
	{
    
    
		HashNode* _next;
		pair<K, V> _kv;

		HashNode(const pair<K, V>& kv)
			:_kv(kv)
			, _next(nullptr)
		{
    
    }
	};

	template<class K, class V, class Hash = HashFunc<K>>
	class HashTable
	{
    
    
		typedef HashNode<K, V> Node;
	public:
		HashTable()
		{
    
    
			_tables.resize(10);
		}

		~HashTable();

		bool Insert(const pair<K, V>& kv);

		Node* Find(const K& key);

		bool Erase(const K& key);
		
	private:
		//哈希表是一个指针数组
		vector<Node*> _tables;
		size_t _n = 0;
		Hash hash;
	};
}

2. Destruction, search, delete

1. Destruction

Destruction is to traverse the hash table and destroy each singly linked list.

~HashTable()
{
    
    
	for (int i = 0; i < _tables.size(); i++)
	{
    
    
		Node* cur = _tables[i];
		while (cur)
		{
    
    
			Node* next = cur->_next;
			delete cur;
			cur = next;
		}
		_tables[i] = nullptr;
	}
}

2. Find

1. Calculate the subscript according to the hash function and find the corresponding hash bucket
2. Traverse the hash bucket and find the data
3. If found, return the node, if not found, return a null pointer

Node* Find(const K& key)
{
    
    
	int hashi = hash(key) % _tables.size();
	Node* cur = _tables[hashi];
	while (cur)
	{
    
    
		if (cur->_kv.first == key)
		{
    
    
			return cur;
		}
		cur = cur->_next;
	}
	return nullptr;
}

3.Delete

To delete is to find the node, let the node's predecessor point to the node's successor, and then delete the node .
Note:
If the node is the head node of the hash bucket, directly let the head node of the hash bucket become the successor of the node. Then delete the node

bool Erase(const K& key)
{
    
    
	int hashi = hash(key) % _tables.size();
	Node* cur = _tables[hashi], * prev = nullptr;
	while (cur)
	{
    
    
		if (cur->_kv.first == key)
		{
    
    
			if (cur == _tables[hashi])
			{
    
    
				_tables[hashi] = cur->_next;
			}
			else
			{
    
    
				prev->_next = cur->_next;
			}
			return true;
		}
		prev = cur;
		cur = cur->_next;
	}
	return false;
}

3.Insert

Because our hash table does not support storing duplicate values, when inserting,
1. First check whether it is present. If it is present, return false to indicate that the insertion failed
. 2. If it is not present, determine whether expansion is needed. If necessary, perform expansion.
3. When inserting, First calculate the corresponding subscript based on the hash function, and then find the hash bucket head and insert it.

1. Code that does not scale

bool Insert(const pair<K, V>& kv)
{
    
    
	//先查找在不在
	//如果在,返回false,插入失败
	if (Find(kv.first))
	{
    
    
		return false;
	}
	//扩容....
	//1.利用哈希函数计算需要插入到那个桶里面
	int hashi = hash(kv.first) % _tables.size();
	//头插
	Node* newnode = new Node(kv);
	newnode->_next = _tables[hashi];
	_tables[hashi] = newnode;
	++_n;
	return true;
}

2. Expansion code

1. Open a new hash table
2. Expand the capacity of the new hash table to 2 times (must be done because when transferring data, you need to map the subscripts according to the size of the new table)
3. When transferring data
(1). Traverse Get the node from the old table
(2). Use the hash function to calculate the subscript of the node in the new table
(3). Insert the node
4. After transferring the data, don’t forget to empty the node in the hash bucket in the old table. , otherwise there will be a wild pointer problem

bool Insert(const pair<K, V>& kv)
{
    
    
	//扩容
	if (_n == _tables.size())
	{
    
    
		//开辟新的哈希表
		HashTable newtable;
		int newcapacity = _tables.size() * 2;
		//扩2倍
		newtable._tables.resize(newcapacity);
		//转移数据
		for (int i = 0; i < _tables.size(); i++)
		{
    
    
			Node* cur = _tables[i];
			while (cur)
			{
    
    
				Node* next = cur->_next;
				int hashi = hash(cur->_kv.first) % newtable._tables.size();
				cur->_next = newtable._tables[hashi];
				newtable._tables[hashi] = cur;
				cur = next;
			}
			//防止出现野指针导致重复析构...
			_tables[i] = nullptr;
		}
	}
}

3. Complete code inserted

bool Insert(const pair<K, V>& kv)
{
    
    
	//先查找在不在
	//如果在,返回false,插入失败
	if (Find(kv.first))
	{
    
    
		return false;
	}
	//扩容
	if (_n == _tables.size())
	{
    
    
		//开辟新的哈希表
		HashTable newtable;
		int newcapacity = _tables.size() * 2;
		//扩2倍
		newtable._tables.resize(newcapacity);
		//转移数据
		for (int i = 0; i < _tables.size(); i++)
		{
    
    
			Node* cur = _tables[i];
			while (cur)
			{
    
    
				Node* next = cur->_next;
				int hashi = hash(cur->_kv.first) % newtable._tables.size();
				cur->_next = newtable._tables[hashi];
				newtable._tables[hashi] = cur;
				cur = next;
			}
			//防止出现野指针导致重复析构...
			_tables[i] = nullptr;
		}
		//交换两个vector,从而做到交换两个哈希表
		//通过学习vector的模拟实现,我们知道vector进行交换时只交换first,finish,end_of_storage
		_tables.swap(newtable._tables);
	}
	//1.利用哈希函数计算需要插入到那个桶里面
	int hashi = hash(kv.first) % _tables.size();
	//头插
	Node* newnode = new Node(kv);
	newnode->_next = _tables[hashi];
	_tables[hashi] = newnode;
	++_n;
	return true;
}

4. Complete code for opening hash table

namespace wzs
{
    
    
	//HashFunc<int>
	template<class K>
	//整型的哈希函数
	struct HashFunc
	{
    
    
		size_t operator()(const K& key)
		{
    
    
			return (size_t)key;
		}
	};

	//HashFunc<string>
	//string的哈希函数
	template<>
	struct HashFunc<string>
	{
    
    
		size_t operator()(const string& key)
		{
    
    
			// BKDR
			size_t hash = 0;
			for (auto e : key)
			{
    
    
				hash *= 131;
				hash += e;
			}
			return hash;
		}
	};

	template<class K, class V>
	struct HashNode
	{
    
    
		HashNode* _next;
		pair<K, V> _kv;

		HashNode(const pair<K, V>& kv)
			:_kv(kv)
			, _next(nullptr)
		{
    
    }
	};

	template<class K, class V, class Hash = HashFunc<K>>
	class HashTable
	{
    
    
		typedef HashNode<K, V> Node;
	public:
		HashTable()
		{
    
    
			_tables.resize(10);
		}

		~HashTable()
		{
    
    
			for (int i = 0; i < _tables.size(); i++)
			{
    
    
				Node* cur = _tables[i];
				while (cur)
				{
    
    
					Node* next = cur->_next;
					delete cur;
					cur = next;
				}
				_tables[i] = nullptr;
			}
		}

		bool Insert(const pair<K, V>& kv)
		{
    
    
			//先查找在不在
			//如果在,返回false,插入失败
			if (Find(kv.first))
			{
    
    
				return false;
			}
			//扩容
			if (_n == _tables.size())
			{
    
    
				//开辟新的哈希表
				HashTable newtable;
				int newcapacity = _tables.size() * 2;
				//扩2倍
				newtable._tables.resize(newcapacity);
				//转移数据
				for (int i = 0; i < _tables.size(); i++)
				{
    
    
					Node* cur = _tables[i];
					while (cur)
					{
    
    
						Node* next = cur->_next;
						int hashi = hash(cur->_kv.first) % newtable._tables.size();
						cur->_next = newtable._tables[hashi];
						newtable._tables[hashi] = cur;
						cur = next;
					}
					//防止出现野指针导致重复析构...
					_tables[i] = nullptr;
				}
				//交换两个vector,从而做到交换两个哈希表
				//通过学习vector的模拟实现,我们知道vector进行交换时只交换first,finish,end_of_storage
				_tables.swap(newtable._tables);
			}
			//1.利用哈希函数计算需要插入到那个桶里面
			int hashi = hash(kv.first) % _tables.size();
			//头插
			Node* newnode = new Node(kv);
			newnode->_next = _tables[hashi];
			_tables[hashi] = newnode;
			++_n;
			return true;
		}

		Node* Find(const K& key)
		{
    
    
			int hashi = hash(key) % _tables.size();
			Node* cur = _tables[hashi];
			while (cur)
			{
    
    
				if (cur->_kv.first == key)
				{
    
    
					return cur;
				}
				cur = cur->_next;
			}
			return nullptr;
		}

		bool Erase(const K& key)
		{
    
    
			int hashi = hash(key) % _tables.size();
			Node* cur = _tables[hashi], * prev = nullptr;
			while (cur)
			{
    
    
				if (cur->_kv.first == key)
				{
    
    
					if (cur == _tables[hashi])
					{
    
    
						_tables[hashi] = cur->_next;
					}
					else
					{
    
    
						prev->_next = cur->_next;
					}
					--_n;
					return true;
				}
				prev = cur;
				cur = cur->_next;
			}
			return false;
		}

	private:
		//哈希表是一个指针数组
		vector<Node*> _tables;
		size_t _n = 0;
		Hash hash;
	};
}

The above is the entire content of the implementation of C++ hash table. I hope it can be helpful to everyone!

Guess you like

Origin blog.csdn.net/Wzs040810/article/details/135164264