Simulation implementation of unordered_map and unordered_set

The simulation implementation of unordered_map and unordred_set is similar to the simulation implementation of map and set. In the simulation implementation of map and set, the underlying data structure is a red-black tree. The underlying data structure of unordered_map and unordered_set is a hash table. Therefore, you must ensure that you are familiar with hash tables before simulating unordered_map and unordred_set. If you are still unfamiliar with hash tables, you can refer to these two articles I have written before: What we
implement
here unordered_mapand unordered_setthe hash table implemented with open hashing as the underlying data structure Oh!

Modify stored data

In the use of unordered_map and unordred_set, we know that the data types they store are different. In order to allow the hash table to adapt to unordered_mapand at the same time unordered_set, we need to HashNodemodify to a certain extent. The principle is relatively simple, because their data types are different, we only need to template the data types they store. HashNodeBy passing in different data types, a hash table that stores different elements is instantiated .

  • When this template parameter is passed in pairthen yes unordered_map.
  • When this template parameter is not passed in one pair, then it is one unordered_set.
template<class T>
struct HashNode
{
    
    
   T _data;
   HashNode<T>* _next;

   HashNode(const T& data)
   	:_data(data)
   	, _next(nullptr)
   {
    
    }
};

We changed the original two template parameters into one. Whatever type is passed in, then HashNodethis type will be stored.

Obtain key

For unnordered_set, its key value is the template parameter T. But for unordered_map, his key value is: T.first. Because unordered_mapwhat is stored is a pair.
How to solve it? The processing ideas are the same as the simulation implementation mapand . setThis is achieved through functors.
Therefore, HashTableanother template parameter needs to be added to , which may be called: KeyOfT. In this functor, we will get its key value based on the data type passed in.

  • For unordered_map, the functor will return the passed in parameter first. Because it stores pairs. The key value is his first.
  • For unordered_set, the functor directly returns the passed parameter itself. Because the data he stores is the key value itself!
//这是 unordered_set.h 的代码
#pragma once
#include"Hash.h"

template<class K>
class unordered_set
{
    
    
public:
    struct SetKeyOfT
    {
    
    
        const K& operator()(const K& key)
        {
    
    
            return key;
        }
    };
private:
    HashTable<K, K, SetKeyOfT> _ht;
};

In the above code, we define a structure. SetOfTIn this structure, we overload the parentheses operator. Then pass this type into the key value HashTableused to obtain unorered_set. You may ask: Why do you need to pass in a K value to instantiate the hash table? Don't worry, I'll answer it for you right away.

//这是 unordered_map.h 中的代码
#pragma once
#include "Hash.h"

template<class K, class V>
class unordered_map
{
    
    
public:
    struct MapKeyOfT
    {
    
    
        const K& operator()(const pair<K, V>& kv)
        {
    
    
            return kv.first;
        }
    };
private:
    HashTable<K, pair<K, V>, MapKeyOfT> _ht;
};

Similarly, we define a structure: MapOfTIn this structure structure, the parentheses operator is overloaded, and then the structure type is passed into HashTableto obtain unordered_mapthe key value of .


We modified HashNodethe stored data, HashTabletemplate parameters of . Then HashTablethe code in also needs to be modified accordingly.
Let's first look at HashTablethe template parameters:
Insert image description here

  • K: With this template parameter, HashTablethe type of function parameters can be written more easily:
    Insert image description here
    Insert image description here
    in the above two member functions, we need to use the key type to define the formal parameters of the function. If HashTablethe template parameter of does not add K, it will be difficult for us to obtain the type unordered_mapof unoredred_setthe key value of . This is why you need to pass in the type of the key value when instantiating the hash table in unordered_map.hand . unordered_setIsn't it very subtle?

  • T: The stored data type unordered_mapis pair<K, V>: unordered_setK.

  • KeyOfT: Use this type to instantiate an object and call operator()the functor! Get key value.

  • HashFunc: This template parameter can handle strings as key values. The detailed logic is discussed in the previous article: Hash table closed hash implementation.


Let’s look at HashTablethe modifications to the code:

  • The arguments to the insert function are no const pair<K, V>& kvlonger const T& data;
  • All places where key values ​​are obtained must be obtained using functors.

Ordinary iterator

Needless to say, hash table iterators are not native pointers. Because the data of each hash table node is not a continuous physical space.
We have to think about how to encapsulate the iterator of the hash table:

  • First, the class must contain HashNodea pointer.
  • Secondly, when adding or subtracting the iterator, it may span different hash buckets, which makes it more difficult to handle! Let’s think about how we should solve it first.
    Insert image description here
    • Assume that the value pointed by our current iterator is HashNode44, then after this iterator is added, we have to cross the current hash bucket and find the next valid one HashNode.
    • Suppose that the value pointed to by our current iterator HashNodeis 7, then after the iterator is subtracted, we also have to cross the current hash bucket and find a valid one forward HashNode.
  • As mentioned just now, our iterator may cross the hash bucket after addition or subtraction. So is it necessary to encapsulate the subscript of the current hash bucket in the array in the iterator? If you think about it carefully, it is actually not necessary, because we can get the key value of the stored data through the current iterator, and obtain its subscript in the array through the division and remainder method.
    But another problem is exposed: when crossing the hash bucket, the _table of the hash table cannot be obtained in the iterator, and we cannot find a valid one forward or backward HashNode. Therefore, we can encapsulate a pointer in the iterator HashTable, and we can smoothly search forward or backward when crossing the hash bucket! (Of course there are many ways to implement it, it depends on your imagination! For example: you can pass _table over)
template <class K, class T, class KeyOfT, class HashFunc = DefaultHashFunc<K>>
struct __HashIterator
{
    
    
	typedef HashNode<T> Node;
	typedef __HashIterator<K, T, KeyOfT, HashFunc> self;
	
	//构造函数
	__HashIterator(Node* node, HashTable<K, T, KeyOfT, HashFunc>* pht)
	{
    
    
		_node = node;
		_pht = pht;
	}

	HashTable<K, T, KeyOfT, HashFunc>* _pht;
	Node* _node;
};

operator++()

  • If the current iterator _node->_nextis not nullptr, it means that the current node _nextis a valid node. We can directly modify it _nodeto _node->_next.
  • If the current iterator _node->_nextis empty, then we need to find the next position.
    • First we need to calculate which subscript of the current iterator _nodeis in .HashTable
    • Then starting from the next position of the calculated index, the next hash bucket that is not empty is searched.
    • If a non-empty hash bucket cannot be found, it means that the node of the current iterator is already the last valid element of the hash table. We can let be act _nodeas nullptrour enditerator.
self operator++()
{
    
    
	if(_node->_next) //当前的哈希桶还有节点
	{
    
    
		_node = _node->_next;
		return *this;
	}
	else
	{
    
    
		KeyOfT kot;
		HashFunc hf;
		size_t hashi = hf(kot(_node->_data)) % _pht->_table.size();
		++hashi;
		while(hashi < _pht->_table.size())
		{
    
    
			if(_pht->_table[hashi])
			{
    
    
				_node = _pht->_table[hashi];
				return *this;
			}
			else
			{
    
    
				++hashi;
			}
		}

		_node = nullptr;
		return *this;
	}
}

operator!=()

This function is simple, just use the node pointer to compare directly!

bool operator!=(const self& s)
{
    
    
	return _node != s._node;
}

operator*() and operator->()

We have written these two functions many times. Just return the data field and the pointer to the data field respectively.

T& operator*()
{
    
    
	return _node->_data;
}

T* operator->()
{
    
    
	return &_node->_data;
}

begin() and end()

These two functions are HashTablewritten in the class of , so don’t get confused by writing them.

  • begin(): We only need to traverse _table to find the first hash bucket that is not empty. Just return the first node of this hash bucket. If not found, it is nullptrreturned as the first parameter of the iterator constructor.
  • end(): Use it directly nullptras the first parameter of the iterator.
    In __HashIterator, the second parameter of the constructor is a pointer to the hash table. How should the second parameter be passed when begin and end return? Is it just thisa pointer? This is why we use the hash table pointer as the iterator member, because it is simple! If _tableso, that's fine, but it's going to be a little more troublesome.
iterator begin()
{
    
    
	for(size_t i = 0; i < _table.size(); i++)
	{
    
    
		Node* cur = _table[i];
		if(cur) return iterator(cur, this);
	}
	return iterator(nullptr, this);
}

iterator end()
{
    
    
	return iterator(nullptr, this);
}

You compile the code and find that the compilation fails: the reason is that you need to use a hash table in the iterator, and you need to use an iterator in the hash table. No matter which class you put first, it won't work. Therefore a forward declaration is required.

template <class K, class T, class KeyOfT, class HashFunc>
class HashTable;

Add begin() and end() for unorered_mapandunordered_set

Both of these functions are relatively simple! We have HashTableimplemented the begin and end functions in . Therefore, you only need to call the begin and end functions in these two containers respectively.

iterator begin()
{
    
    
	return _ht.begin();
}

iterator end()
{
    
    
	return _ht.end();
}

Just make a copy of the above code in unordred_mapand !unordered_set

After the code is written here, we can use range for to traverse the unordred_mapand implemented by ourselves unordered_set. But after we ran the code, another error was reported. Says: _table is not accessible. In the iterator we seal the hash table pointer. But because _table is a private member in the hash table, of course it cannot be accessed from the outside!
Insert image description here
Here are two more reliable solutions:

  • Write a function that returns HashTablethe _table member of .
  • Yonyou, who is whose friend? Of course he is __HashIteratoryour HashTablefriend!
template <class K, class T, class KeyOfT, class HashFunc>
friend struct __HashIterator;

visual studioThere is no problem with the above code running in , but vscodeit cannot run in . The error message is: Nested scopes cannot use the same template parameters. If you encounter such an error message, just change the name of the template parameter!

const iterator

First of all, we need to clarify the difference between ordinary iterators and const iterators. We already knew before we transferred list, map, and set. In fact, the return value types of begin and end are different! The processing method is the same: just parameterize the different places! So we have to __HashIteratoradd template parameters to again!

template <class K, class T, class Ref, class Ptr, class KeyOfT, class HashFunc>
struct __HashIterator
  • Ref: means returning a reference. If the Ref passed in by an ordinary iterator is T&, the returned one is T&! If it is a const iterator, the Ref passed in is const T&, and the returned value is naturally const T&!
  • Ptr: represents the return pointer. If it is a normal iterator, the Ptr passed in is T*, and the returned value is T*! If it is a const iterator, the Ptr passed in is const T*, and the returned value is naturally const T*!

Therefore: HashTabledifferent parameter types must be passed in to define ordinary iterators and const iterators.

typedef __HashIterator<K, T, T&, T*, KeyOfT, HashFunc> iterator;
typedef __HashIterator<K, T, const T&, const T*, KeyOfT, HashFunc> const_iterator;

What happens after implementing cosnt_iterator? We just need to HashTableadd the const iterator version for , which is also very simple! After the addition is completed, we add cosnt iterators for unordred_mapand . unordered_setThe key value of unordered_setis not allowed to be modified, so unordered_setthe iterator and const_iterator of are both const_iterators.
The way map solves the problem that the key cannot be modified is to add const to K in the pair.
But after making this modification, an error occurred:
Insert image description here

invalid conversion from ‘const HashTable<int, int, unordered_set::SetKeyOfT, DefaultHashFunc >’ to ‘HashTable<int, int, unordered_set::SetKeyOfT, DefaultHashFunc >’ [-fpermissive]

What is the reason for this? The const in begin() const modifies the content pointed to by this. Therefore, the complete type of this here is: const HashTable<K, T, KeyOfT, HashFunc> *But what? The second parameter of the iterator's constructor is: HashTable<K, T, KeyOfT, HashFunc>* pht
non-const cannot be converted to const type, so an error will be reported. The solution is to add cosnt to the parameters of the constructor. And __HashIteratorchange the member of the hash table pointer to const. Because the iterator does not modify the contents of the hash table through the hash table pointer. So there is absolutely no problem in writing this way!

Modify Find return value

The return value in the library function findis an iterator. So we also need to modify. The modification method is very simple, just do it at the return place!
Just call the constructor where it returns!

iterator Find(const K &key)
{
    
    
	HashFunc hf;
	KeyOfT kot;
	size_t hashi = hf(key) % _table.size();
	Node *cur = _table[hashi];
	while (cur)
	{
    
    
		if (kot(cur->_data) == key)
		{
    
    
			return iterator(cur, this);
		}

		cur = cur->_next;
	}

	return iterator(nullptr, this);
}

Modify the return value of the insert function

First we need to know the return value of Insert in the library function: pair<iterator, bool>

  • When determining whether this element already exists in the hash table, you need to use an iterator to receive the return value of the Find function. Compare this return value with end. If it is not equal to end, it means that this element already exists in the hash table. We return this iterator and the pair constructed with false.
  • If it is a new insertion, just use the newly inserted node to construct an iterator object and package it into a pair with true and return it.
    Insert image description here
    Insert image description here
    Okay, we have now modified HashTablethe Insert function in . After that, we successfully modified the insert function unordered_mapin and unordered_set, but after compiling the modification, we found that something went wrong again!
    Insert image description here
    As shown in the figure above, in unordered_set, although the return value of insert looks like: pair<iterator, bool>, it is because unordered_setthe iterator and const_iterator of are both const_iterators. Therefore, unordered_setthe actual type of the return value of the insert function is: pair<const_iterator, bool> Naturally, there will be problems with type incompatibility and inability to convert.
    The solution has been explained in the simulation implementation part of mapand . We can add a very constructor-like function to :set
    __HashIterator
typedef __HashIterator<K, T, T*, T&, KeyOfT, HashFunc> Iterator;
__HashIterator(const Iterator& it)
	:_node(it._node)
	,_pht(it._pht)
{
    
    }

Iterator is defined with T*, T&, not Ref and Ptr, so when the iterator is an ordinary iterator, it is a copy constructor. When the iterator is a const iterator, it is a type conversion. Very clever.

unordered_map 的 opertor[]

The principle is relatively simple, just call the insert function. Regardless of whether the insertion is successful or failed, just return the second in the iterator data corresponding to the insert return value.

V& operator[](const K& key)
{
    
    
    pair<iterator, bool> ret = _ht.Insert(make_pair(key, V()));
    return ret.first->second;
}

Complete code

Hash.h

#pragma once
#include <vector>

template <class K>
struct DefaultHashFunc
{
    
    
	size_t operator()(const K &key)
	{
    
    
		return (size_t)key;
	}
};

// 12:00
template <>
struct DefaultHashFunc<string>
{
    
    
	size_t operator()(const string &str)
	{
    
    
		// BKDR
		size_t hash = 0;
		for (auto ch : str)
		{
    
    
			hash *= 131;
			hash += ch;
		}

		return hash;
	}
};

template <class T>
struct HashNode
{
    
    
	T _data;
	HashNode<T> *_next;

	HashNode(const T &data)
		: _data(data), _next(nullptr)
	{
    
    
	}
};

template <class K, class T, class KeyOfT, class HashFunc>
class HashTable;

template <class K, class T, class Ref, class Ptr, class KeyOfT, class HashFunc>
struct __HashIterator
{
    
    
	typedef HashNode<T> Node;
	typedef __HashIterator<K, T, Ref, Ptr, KeyOfT, HashFunc> self;
	typedef __HashIterator<K, T, T&, T*, KeyOfT, HashFunc> Iterator;
	
	//构造函数
	__HashIterator(Node* node, const HashTable<K, T, KeyOfT, HashFunc>* pht)
	{
    
    
		_node = node;
		_pht = pht;
	}

	__HashIterator(const Iterator& it)
		:_node(it._node)
		,_pht(it._pht)
	{
    
    }

	self operator++()
	{
    
    
		if(_node->_next) //当前的哈希桶还有节点
		{
    
    
			_node = _node->_next;
			return *this;
		}
		else
		{
    
    
			KeyOfT kot;
			HashFunc hf;
			size_t hashi = hf(kot(_node->_data)) % _pht->_table.size();
			++hashi;
			while(hashi < _pht->_table.size())
			{
    
    
				if(_pht->_table[hashi])
				{
    
    
					_node = _pht->_table[hashi];
					return *this;
				}
				else
				{
    
    
					++hashi;
				}
			}

			_node = nullptr;
			return *this;
		}
	}

	bool operator!=(const self& s)
	{
    
    
		return _node != s._node;
	}

	Ref operator*()
	{
    
    
		return _node->_data;
	}

	Ptr operator->()
	{
    
    
		return &_node->_data;
	}

	const HashTable<K, T, KeyOfT, HashFunc>* _pht;
	Node* _node;
};

template <class K, class T, class KeyOfT, class HashFunc = DefaultHashFunc<K>>
class HashTable
{
    
    
public:
	typedef HashNode<T> Node;
	typedef __HashIterator<K, T, T&, T*, KeyOfT, HashFunc> iterator;
	typedef __HashIterator<K, T, const T&, const T*, KeyOfT, HashFunc> const_iterator;

private:
	template <class U, class Q, class W, class E, class Y, class I>
	friend struct __HashIterator;

public:


	iterator begin()
	{
    
    
		for(size_t i = 0; i < _table.size(); i++)
		{
    
    
			Node* cur = _table[i];
			if(cur) return iterator(cur, this);
		}
		return iterator(nullptr, this);
	}

	iterator end()
	{
    
    
		return iterator(nullptr, this);
	}


	const_iterator begin() const
	{
    
    
		for(size_t i = 0; i < _table.size(); i++)
		{
    
    
			Node* cur = _table[i];
			if(cur) return const_iterator(cur, this);
		}
		return const_iterator(nullptr, this);
	}

	const_iterator end() const
	{
    
    
		return const_iterator(nullptr, this);
	}

	HashTable()
	{
    
    
		_table.resize(10, nullptr);
	}

	~HashTable()
	{
    
    
		for (size_t i = 0; i < _table.size(); i++)
		{
    
    
			Node *cur = _table[i];
			while (cur)
			{
    
    
				Node *next = cur->_next;
				delete cur;
				cur = next;
			}

			_table[i] = nullptr;
		}
	}

	pair<iterator, bool> Insert(const T &data)
	{
    
    
		KeyOfT kot;

		iterator it = Find(kot(data));
		if(it != end())
		{
    
    
			return make_pair(it, false);
		}

		HashFunc hf;

		// 负载因子到1就扩容
		if (_n == _table.size())
		{
    
    
			// 16:03继续
			size_t newSize = _table.size() * 2;
			vector<Node *> newTable;
			newTable.resize(newSize, nullptr);

			// 遍历旧表,顺手牵羊,把节点牵下来挂到新表
			for (size_t i = 0; i < _table.size(); i++)
			{
    
    
				Node *cur = _table[i];
				while (cur)
				{
    
    
					Node *next = cur->_next;

					// 头插到新表
					size_t hashi = hf(kot(cur->_data)) % newSize;
					cur->_next = newTable[hashi];
					newTable[hashi] = cur;

					cur = next;
				}

				_table[i] = nullptr;
			}

			_table.swap(newTable);
		}

		size_t hashi = hf(kot(data)) % _table.size();
		// 头插
		Node *newnode = new Node(data);
		newnode->_next = _table[hashi];
		_table[hashi] = newnode;
		++_n;
		return make_pair(iterator(newnode, this), true);
	}

	iterator Find(const K &key)
	{
    
    
		HashFunc hf;
		KeyOfT kot;
		size_t hashi = hf(key) % _table.size();
		Node *cur = _table[hashi];
		while (cur)
		{
    
    
			if (kot(cur->_data) == key)
			{
    
    
				return iterator(cur, this);
			}

			cur = cur->_next;
		}

		return iterator(nullptr, this);
	}

	bool Erase(const K &key)
	{
    
    
		HashFunc hf;
		KeyOfT kot;
		size_t hashi = hf(key) % _table.size();
		Node *prev = nullptr;
		Node *cur = _table[hashi];
		while (cur)
		{
    
    
			if (kot(cur->_data) == key)
			{
    
    
				if (prev == nullptr)
				{
    
    
					_table[hashi] = cur->_next;
				}
				else
				{
    
    
					prev->_next = cur->_next;
				}

				delete cur;
				return true;
			}

			prev = cur;
			cur = cur->_next;
		}

		return false;
	}

	void Print()
	{
    
    
		for (size_t i = 0; i < _table.size(); i++)
		{
    
    
			printf("[%d]->", i);
			Node *cur = _table[i];
			while (cur)
			{
    
    
				cout << cur->_kv.first << ":" << cur->_kv.second << "->";
				cur = cur->_next;
			}
			printf("NULL\n");
		}
		cout << endl;
	}

private:
	vector<Node *> _table; // 指针数组
	size_t _n = 0;		   // 存储了多少个有效数据
};

unodered_map.h

#pragma once

#include "Hash.h"

template<class K, class V>
class unordered_map
{
    
    
public:
    struct MapKeyOfT
    {
    
    
        const K& operator()(const pair<K, V>& kv)
        {
    
    
            return kv.first;
        }
    };
    typedef typename HashTable<K, pair<const K, V>, MapKeyOfT>::iterator iterator;
    typedef typename HashTable<K, pair<const K, V>, MapKeyOfT>::const_iterator const_iterator;

    pair<iterator, bool> insert(const pair<K, V>& kv)
    {
    
    
        return _ht.Insert(kv);
    }

    iterator begin()
    {
    
    
        return _ht.begin();
    }

    iterator end()
    {
    
    
        return _ht.end();
    }
    
    V& operator[](const K& key)
    {
    
    
        pair<iterator, bool> ret = _ht.Insert(make_pair(key, V()));
        return ret.first->second;
    }


private:
    HashTable<K, pair<const K, V>, MapKeyOfT> _ht;
};

unordered_set.h

#pragma once
#include"Hash.h"

template<class K>
class unordered_set
{
    
    
public:
    struct SetKeyOfT
    {
    
    
        const K& operator()(const K& key)
        {
    
    
            return key;
        }
    };
    typedef typename HashTable<K, K, SetKeyOfT>::const_iterator iterator;
    typedef typename HashTable<K, K, SetKeyOfT>::const_iterator const_iterator;

    pair<iterator, bool> insert(const K& key)
    {
    
    
        pair<typename HashTable<K, K, SetKeyOfT>::iterator, bool> ret = _ht.Insert(key);
        return pair<iterator, bool>(ret.first, ret.second);
    }

    iterator begin() const
    {
    
    
        return _ht.begin();
    }

    iterator end() const
    {
    
    
        return _ht.end();
    }

private:
    HashTable<K, K, SetKeyOfT> _ht;
};


Guess you like

Origin blog.csdn.net/m0_73096566/article/details/134605021