foreword

The book continues from the previous chapter, and we will continue to explain the knowledge related to the hash table.

zipper method

The zipper method is another way to solve hash conflicts. As the name implies, when two key values hash conflict, the two values will be hung on a linked list, so no stampede event will occur. It is a common solution. A way of Greek conflict.

Idea analysis

To realize such a hash table, our hash table must be designed as a structure that can be attached to a linked list. Insertion, search, and deletion operations must also take into account linked list operations. Of course, there is no need for three node states.

structure realization

Node: Through the above analysis, besides the stored data, the node should also have a pointer pointing to the next node of the linked list.

template<class K,class V>
struct HashNode
{
    
    
	HashNode<K, V>* _next;
	pair<K, V> _kv;

	HashNode(const pair<K,V>& kv)
		: _kv(kv)
		, _next(nullptr)
	{
    
     }
};

Hash table structure:

template<class K,class V, class hash = HashFunc<K>>
class HashTable
{
    
    
public:
private:
	vector<Node*> _tables;
	size_t _n = 0;
};

function implementation

gethash

In the class template, we introduced a hash class template parameter. In fact, this is used to obtain the hash value of the corresponding key, because not all types can be directly converted into integers for hashing. We can use class templates to return keys of other types.

template<class K>
struct HashFunc
{
    
    
	size_t operator()(const K& key)
	{
    
    
		return key;
	}
};

template<>
struct HashFunc<string>
{
    
    
	size_t operator()(const string& s)
	{
    
    
		size_t hash = 0;
		for (auto c : s)
		{
    
    
			hash += c;
			hash *= 31;
		}
		return hash;
	}
};

Here a class template specialization is used to handle the string class.

getnextPrime

The expansion can be doubled, but referring to the solution given by Curry, the possibility of hash conflicts caused by expansion according to a certain number is small:

size_t GetNextPrime(size_t prime)
{
    
    
	// SGI
	static const int __stl_num_primes = 28;
	static const unsigned long __stl_prime_list[__stl_num_pri
	{
    
    
		53, 97, 193, 389, 769,
		1543, 3079, 6151, 12289, 24593,
		49157, 98317, 196613, 393241, 786433,
		1572869, 3145739, 6291469, 12582917, 25165843,
		50331653, 100663319, 201326611, 402653189, 805306457,
		1610612741, 3221225473, 4294967291
	};

	size_t i = 0;
	for (; i < __stl_num_primes; ++i)
	{
    
    
		if (__stl_prime_list[i] > prime)
			return __stl_prime_list[i];
	}

	return __stl_prime_list[i];
}

insert

Most of the logic of insertion is basically the same as the insertion in the open address method, except that the linear detection of the original node becomes the head insertion of the linked list.

bool insert(const pair<K,V> kv)
{
    
    
	// 检查是否出现过
	if (find(kv.first)) return false;

	hash gethash;

	// 扩容
	if (_n == _tables.size())
	{
    
    
		size_t newsize = GetNextPrime(_tables.size());
		vector<Node*> newtables(newsize, nullptr);

		for (Node*& cur : _tables)
		{
    
    
			while (cur)
			{
    
    
				Node* next = cur->_next;

				size_t hashi = gethash(cur->_kv.first) % newtables.size();

				//headin
				cur->_next = newtables[hashi];
				newtables[hashi] = cur;

				cur = next;

			}
		}
		_tables.swap(newtables)
	}

	// 插入
	size_t hashi = gethash(kv.first) % _tables.size();
	Node* newnode = new Node(kv);
	Node* cur = _tables[hashi];

	// 头插
	newnode->_next = _tables[hashi];
	_tables[hashi] = newnode;

	//负载因子++
	_n++;

	return true;
}

delete

The deletion operation is to find the corresponding location and delete it. It should be noted that after the header is deleted, the corresponding node should be set to empty to prevent illegal access next time.

bool Erase(const K& key)
{
    
    
	Node* ret = find(key);
	if (!ret) return false;

	Hash gethash;
	size_t hashi = gethash(key) % _tables.size();

	Node* cur = _tables[hashi];

	if (ret == cur)
	{
    
    
		// 头删
		cur = ret;
		_tables[hashi] = nullptr;
	}
	else
	{
    
    
		while (cur->_next != ret)
		{
    
    
			cur = cur->_next;
		}

		cur->_next = ret->_next;
		
	}

	delete ret;
	return true;
}

look up

Node* find(const K& key)
{
    
    
	if (!_table.size())
		return nullptr;

	hash gethash;
	size_t hashi = gethash(key) % _tables.size();

	Node* cur = _tables[hashi];
	while (cur)
	{
    
    
		if (cur->_kv.first == key)
		{
    
    
			return cur;
		}
		cur = cur->_next;
	}
	return nullptr;
}

epilogue

This is the end of the second way to implement hashing. Next, I will explain how to encapsulate the unordered series of maps and sets with a hash table, and modify the structure of the hash table. See you next time~

[Data structure] in the hash table - zipper method