foreword

It's been a long time, everyone. Today, I will explain the basic principles of the hash table and use the open addressing method to implement a simple hash table.

map

The idea of the hash table is to map a set of data into another set of data that can be directly searched. If there is a set of data

10，11，17，13 ，18

We can map these data into an array through certain mapping rules:
if the array has only five elements, we can use key % 5 to find the mapping.

insert image description here
When we are looking for this number, we can directly index under the array in the same way. This is the idea of hashing, which can greatly improve the search speed.

hash collision

Through the above example, the array of five elements is not full, and both 18 and 11 occupy the same pit. This phenomenon is called hash collision. According to the mapping rule, hash collision will inevitably occur, so we have to Try to resolve this conflict.

open addressing

The first method to solve hash conflicts, the solution of the open addressing method is linear detection, that is, if a mapped pit is occupied, the data will be put backward. This method is relatively simple to implement, but it is easy to happen Stampede, as shown in the figure:

insert image description here

If 18 has a hash conflict with 13, we put her in the pit of 4 according to the linear detection. When we want to put 16, it should be placed in the pit of No. 4, but it is placed in the pit of No. 0 because of trampling on the intelligence. This approach is actually not ideal.

Simple implementation (only the hash table is implemented, and the chain address method will be used to encapsulate the unmap series later):

Idea analysis

The open addressing method means that if the mapped position already has an element, we will search backward for the first position without an element. Here we need to pay attention to the characteristics of this backward search for elements. To delete an element, we cannot simply set empty, otherwise the element may not be found.

structural analysis

Through the above analysis, we need three states to represent the state of each node, which is achieved by enumeration:

enum States
{
    
    
	EXIST,
	DELETE,
	EMPTY
};

Then each hash node must contain at least two elements:
1. Data
2. Status

template<class K,class V>
struct HashDate
{
    
    
	pair<K, V> _kv;
	States _st;
};

We can construct this hash table with an array of hash table nodes. At the same time, a _n is needed to act as a load factor, indicating the occupation of the hash table.

template<class K,class V>
class HashTable
{
    
    
public:

private:
	vector<HashDate<K, V>> _tables;
	size_t _n = 0;
};

function implementation

insert

In the insert operation, there are several details that need to be paid attention to:
1. Expansion, because the corresponding mapping will change (size will change) after the hash table is expanded, and it needs to be remapped. Therefore, it is consistent with the main logic of insertion. We can use open A new vector, reuse the logic of insert, and finally exchange the two tables.
2. Linear detection, if you go to the end of the array during detection, you need to correct it to the starting point

bool insert(const pair<K,V> kv)
{
    
    
	if (find(kv.first)) return false;

	//Expansion
	if (_tables.size() == 0 || _n*10 / _tables.size() > 7)
	{
    
    
		size_t newsize = _tables.size() == 0 ? 10 : 2 * _tables.size();

		HashTable<K, V> newht;
		newht._tables.resize(newsize);

		for (auto data : _tables)
		{
    
    
			if (data._st == EXIST)
			{
    
    
				newht.insert(data._kv);
			}
		}
		//swap(_tables, newht._tables);
		_tables.swap(newht._tables);
	}

	//Insert
	size_t hashi = kv.first % _tables.size();

	//check
	size_t i = 1;
	size_t index = hashi;
	while (_tables[index]._st == EXIST)
	{
    
    
		index = hashi + i;
		++i;

		index %= _tables.size();
	}

	_tables[index]._kv = kv;
	_tables[index]._st = EXIST;
	_n++;

	return true;

}

delete

Through the key, we find the pointer of the node to be deleted, and modify its status to DELETE to indicate that the node has been deleted. Here we also need to explain that if it is set to EMPTY, the next data may not be found during linear detection. That's why we need to set three node states.

bool erase(const K& key)
{
    
    
	HashDate<K, V>* ret = find(key);
	if (ret)
	{
    
    
		ret->_st = DELETE;
		_n--;
		return true;
	}

	return flase;
}

Look for

The above two operations both use the find operation. In fact, in this implementation, the search operation is relatively simple. We only need to calculate the key, and then perform linear detection backward.

HashDate<K, V>* find(const K& key)
{
    
    
	if (_tables.size() == 0) return false;

	size_t hashi = key % _tables.size();

	//线性探测
	size_t i = 1;
	size_t index = hashi;
	while (_tables[index]._st != EMPTY)
	{
    
    
		if (_tables[index]._st == EXIST &&
			_tables[index]._kv.first == key)
		{
    
    
			return &(_tables[index]);
		}

		index = hashi + i;
		i++;

		index %= _tables.size();

		if (index == hashi)
		{
    
    
			break;
		}
	}

	return nullptr;
}

epilogue

The code of the open addressing method is relatively simple, but it is prone to stampede events, which also makes it inferior to another method - the chain address method is commonly used. In the next article, I will focus on explaining the chain address method and use it to implement unordered_map and unordered_set package.
See you next time~

[Data structure] on the hash table - open addressing method

Article directory