The simulation implementation of unordered_map and unordred_set is similar to the simulation implementation of map and set. In the simulation implementation of map and set, the underlying data structure is a red-black tree. The underlying data structure of unordered_map and unordered_set is a hash table. Therefore, you must ensure that you are familiar with hash tables before simulating unordered_map and unordred_set. If you are still unfamiliar with hash tables, you can refer to these two articles I have written before: What we
implement
hereunordered_map
andunordered_set
the hash table implemented with open hashing as the underlying data structure Oh!
Modify stored data
In the use of unordered_map and unordred_set, we know that the data types they store are different. In order to allow the hash table to adapt to unordered_map
and at the same time unordered_set
, we need to HashNode
modify to a certain extent. The principle is relatively simple, because their data types are different, we only need to template the data types they store. HashNode
By passing in different data types, a hash table that stores different elements is instantiated .
- When this template parameter is passed in
pair
then yesunordered_map
. - When this template parameter is not passed in one
pair
, then it is oneunordered_set
.
template<class T>
struct HashNode
{
T _data;
HashNode<T>* _next;
HashNode(const T& data)
:_data(data)
, _next(nullptr)
{
}
};
We changed the original two template parameters into one. Whatever type is passed in, then HashNode
this type will be stored.
Obtain key
For unnordered_set
, its key value is the template parameter T. But for unordered_map
, his key value is: T.first. Because unordered_map
what is stored is a pair.
How to solve it? The processing ideas are the same as the simulation implementation map
and . set
This is achieved through functors.
Therefore, HashTable
another template parameter needs to be added to , which may be called: KeyOfT
. In this functor, we will get its key value based on the data type passed in.
- For
unordered_map
, the functor will return the passed in parameter first. Because it stores pairs. The key value is his first. - For
unordered_set
, the functor directly returns the passed parameter itself. Because the data he stores is the key value itself!
//这是 unordered_set.h 的代码
#pragma once
#include"Hash.h"
template<class K>
class unordered_set
{
public:
struct SetKeyOfT
{
const K& operator()(const K& key)
{
return key;
}
};
private:
HashTable<K, K, SetKeyOfT> _ht;
};
In the above code, we define a structure. SetOfT
In this structure, we overload the parentheses operator. Then pass this type into the key value HashTable
used to obtain unorered_set
. You may ask: Why do you need to pass in a K value to instantiate the hash table? Don't worry, I'll answer it for you right away.
//这是 unordered_map.h 中的代码
#pragma once
#include "Hash.h"
template<class K, class V>
class unordered_map
{
public:
struct MapKeyOfT
{
const K& operator()(const pair<K, V>& kv)
{
return kv.first;
}
};
private:
HashTable<K, pair<K, V>, MapKeyOfT> _ht;
};
Similarly, we define a structure: MapOfT
In this structure structure, the parentheses operator is overloaded, and then the structure type is passed into HashTable
to obtain unordered_map
the key value of .
We modified HashNode
the stored data, HashTable
template parameters of . Then HashTable
the code in also needs to be modified accordingly.
Let's first look at HashTable
the template parameters:
-
K: With this template parameter,
HashTable
the type of function parameters can be written more easily:
in the above two member functions, we need to use the key type to define the formal parameters of the function. IfHashTable
the template parameter of does not add K, it will be difficult for us to obtain the typeunordered_map
ofunoredred_set
the key value of . This is why you need to pass in the type of the key value when instantiating the hash table inunordered_map.h
and .unordered_set
Isn't it very subtle? -
T: The stored data type
unordered_map
ispair<K, V>
:unordered_set
K. -
KeyOfT: Use this type to instantiate an object and call
operator()
the functor! Get key value. -
HashFunc: This template parameter can handle strings as key values. The detailed logic is discussed in the previous article: Hash table closed hash implementation.
Let’s look at HashTable
the modifications to the code:
- The arguments to the insert function are no
const pair<K, V>& kv
longerconst T& data
; - All places where key values are obtained must be obtained using functors.
Ordinary iterator
Needless to say, hash table iterators are not native pointers. Because the data of each hash table node is not a continuous physical space.
We have to think about how to encapsulate the iterator of the hash table:
- First, the class must contain
HashNode
a pointer. - Secondly, when adding or subtracting the iterator, it may span different hash buckets, which makes it more difficult to handle! Let’s think about how we should solve it first.
- Assume that the value pointed by our current iterator is
HashNode
44, then after this iterator is added, we have to cross the current hash bucket and find the next valid oneHashNode
. - Suppose that the value pointed to by our current iterator
HashNode
is 7, then after the iterator is subtracted, we also have to cross the current hash bucket and find a valid one forwardHashNode
.
- Assume that the value pointed by our current iterator is
- As mentioned just now, our iterator may cross the hash bucket after addition or subtraction. So is it necessary to encapsulate the subscript of the current hash bucket in the array in the iterator? If you think about it carefully, it is actually not necessary, because we can get the key value of the stored data through the current iterator, and obtain its subscript in the array through the division and remainder method.
But another problem is exposed: when crossing the hash bucket, the _table of the hash table cannot be obtained in the iterator, and we cannot find a valid one forward or backwardHashNode
. Therefore, we can encapsulate a pointer in the iteratorHashTable
, and we can smoothly search forward or backward when crossing the hash bucket! (Of course there are many ways to implement it, it depends on your imagination! For example: you can pass _table over)
template <class K, class T, class KeyOfT, class HashFunc = DefaultHashFunc<K>>
struct __HashIterator
{
typedef HashNode<T> Node;
typedef __HashIterator<K, T, KeyOfT, HashFunc> self;
//构造函数
__HashIterator(Node* node, HashTable<K, T, KeyOfT, HashFunc>* pht)
{
_node = node;
_pht = pht;
}
HashTable<K, T, KeyOfT, HashFunc>* _pht;
Node* _node;
};
operator++()
- If the current iterator
_node->_next
is notnullptr
, it means that the current node_next
is a valid node. We can directly modify it_node
to_node->_next
. - If the current iterator
_node->_next
is empty, then we need to find the next position.- First we need to calculate which subscript of the current iterator
_node
is in .HashTable
- Then starting from the next position of the calculated index, the next hash bucket that is not empty is searched.
- If a non-empty hash bucket cannot be found, it means that the node of the current iterator is already the last valid element of the hash table. We can let be act
_node
asnullptr
ourend
iterator.
- First we need to calculate which subscript of the current iterator
self operator++()
{
if(_node->_next) //当前的哈希桶还有节点
{
_node = _node->_next;
return *this;
}
else
{
KeyOfT kot;
HashFunc hf;
size_t hashi = hf(kot(_node->_data)) % _pht->_table.size();
++hashi;
while(hashi < _pht->_table.size())
{
if(_pht->_table[hashi])
{
_node = _pht->_table[hashi];
return *this;
}
else
{
++hashi;
}
}
_node = nullptr;
return *this;
}
}
operator!=()
This function is simple, just use the node pointer to compare directly!
bool operator!=(const self& s)
{
return _node != s._node;
}
operator*() and operator->()
We have written these two functions many times. Just return the data field and the pointer to the data field respectively.
T& operator*()
{
return _node->_data;
}
T* operator->()
{
return &_node->_data;
}
begin() and end()
These two functions are HashTable
written in the class of , so don’t get confused by writing them.
- begin(): We only need to traverse _table to find the first hash bucket that is not empty. Just return the first node of this hash bucket. If not found, it is
nullptr
returned as the first parameter of the iterator constructor. - end(): Use it directly
nullptr
as the first parameter of the iterator.
In__HashIterator
, the second parameter of the constructor is a pointer to the hash table. How should the second parameter be passed when begin and end return? Is it justthis
a pointer? This is why we use the hash table pointer as the iterator member, because it is simple! If_table
so, that's fine, but it's going to be a little more troublesome.
iterator begin()
{
for(size_t i = 0; i < _table.size(); i++)
{
Node* cur = _table[i];
if(cur) return iterator(cur, this);
}
return iterator(nullptr, this);
}
iterator end()
{
return iterator(nullptr, this);
}
You compile the code and find that the compilation fails: the reason is that you need to use a hash table in the iterator, and you need to use an iterator in the hash table. No matter which class you put first, it won't work. Therefore a forward declaration is required.
template <class K, class T, class KeyOfT, class HashFunc>
class HashTable;
Add begin() and end() for unorered_map
andunordered_set
Both of these functions are relatively simple! We have HashTable
implemented the begin and end functions in . Therefore, you only need to call the begin and end functions in these two containers respectively.
iterator begin()
{
return _ht.begin();
}
iterator end()
{
return _ht.end();
}
Just make a copy of the above code in unordred_map
and !unordered_set
After the code is written here, we can use range for to traverse the
unordred_map
and implemented by ourselvesunordered_set
. But after we ran the code, another error was reported. Says: _table is not accessible. In the iterator we seal the hash table pointer. But because _table is a private member in the hash table, of course it cannot be accessed from the outside!
Here are two more reliable solutions:
- Write a function that returns
HashTable
the _table member of .- Yonyou, who is whose friend? Of course he is
__HashIterator
yourHashTable
friend!
template <class K, class T, class KeyOfT, class HashFunc>
friend struct __HashIterator;
visual studio
There is no problem with the above code running in , but vscode
it cannot run in . The error message is: Nested scopes cannot use the same template parameters. If you encounter such an error message, just change the name of the template parameter!
const iterator
First of all, we need to clarify the difference between ordinary iterators and const iterators. We already knew before we transferred list, map, and set. In fact, the return value types of begin and end are different! The processing method is the same: just parameterize the different places! So we have to __HashIterator
add template parameters to again!
template <class K, class T, class Ref, class Ptr, class KeyOfT, class HashFunc>
struct __HashIterator
- Ref: means returning a reference. If the Ref passed in by an ordinary iterator is T&, the returned one is T&! If it is a const iterator, the Ref passed in is const T&, and the returned value is naturally const T&!
- Ptr: represents the return pointer. If it is a normal iterator, the Ptr passed in is T*, and the returned value is T*! If it is a const iterator, the Ptr passed in is const T*, and the returned value is naturally const T*!
Therefore: HashTable
different parameter types must be passed in to define ordinary iterators and const iterators.
typedef __HashIterator<K, T, T&, T*, KeyOfT, HashFunc> iterator;
typedef __HashIterator<K, T, const T&, const T*, KeyOfT, HashFunc> const_iterator;
What happens after implementing cosnt_iterator? We just need to HashTable
add the const iterator version for , which is also very simple! After the addition is completed, we add cosnt iterators for unordred_map
and . unordered_set
The key value of unordered_set
is not allowed to be modified, so unordered_set
the iterator and const_iterator of are both const_iterators.
The way map solves the problem that the key cannot be modified is to add const to K in the pair.
But after making this modification, an error occurred:
invalid conversion from ‘const HashTable<int, int, unordered_set::SetKeyOfT, DefaultHashFunc >’ to ‘HashTable<int, int, unordered_set::SetKeyOfT, DefaultHashFunc >’ [-fpermissive]
What is the reason for this? The const in begin() const modifies the content pointed to by this. Therefore, the complete type of this here is: const HashTable<K, T, KeyOfT, HashFunc> *
But what? The second parameter of the iterator's constructor is: HashTable<K, T, KeyOfT, HashFunc>* pht
non-const cannot be converted to const type, so an error will be reported. The solution is to add cosnt to the parameters of the constructor. And __HashIterator
change the member of the hash table pointer to const. Because the iterator does not modify the contents of the hash table through the hash table pointer. So there is absolutely no problem in writing this way!
Modify Find return value
The return value in the library function find
is an iterator. So we also need to modify. The modification method is very simple, just do it at the return place!
Just call the constructor where it returns!
iterator Find(const K &key)
{
HashFunc hf;
KeyOfT kot;
size_t hashi = hf(key) % _table.size();
Node *cur = _table[hashi];
while (cur)
{
if (kot(cur->_data) == key)
{
return iterator(cur, this);
}
cur = cur->_next;
}
return iterator(nullptr, this);
}
Modify the return value of the insert function
First we need to know the return value of Insert in the library function: pair<iterator, bool>
- When determining whether this element already exists in the hash table, you need to use an iterator to receive the return value of the Find function. Compare this return value with end. If it is not equal to end, it means that this element already exists in the hash table. We return this iterator and the pair constructed with false.
- If it is a new insertion, just use the newly inserted node to construct an iterator object and package it into a pair with true and return it.
Okay, we have now modifiedHashTable
the Insert function in . After that, we successfully modified the insert functionunordered_map
in andunordered_set
, but after compiling the modification, we found that something went wrong again!
As shown in the figure above, inunordered_set
, although the return value of insert looks like: pair<iterator, bool>, it is becauseunordered_set
the iterator and const_iterator of are both const_iterators. Therefore,unordered_set
the actual type of the return value of the insert function is: pair<const_iterator, bool> Naturally, there will be problems with type incompatibility and inability to convert.
The solution has been explained in the simulation implementation part ofmap
and . We can add a very constructor-like function to :set
__HashIterator
typedef __HashIterator<K, T, T*, T&, KeyOfT, HashFunc> Iterator;
__HashIterator(const Iterator& it)
:_node(it._node)
,_pht(it._pht)
{
}
Iterator is defined with T*, T&, not Ref and Ptr, so when the iterator is an ordinary iterator, it is a copy constructor. When the iterator is a const iterator, it is a type conversion. Very clever.
unordered_map 的 opertor[]
The principle is relatively simple, just call the insert function. Regardless of whether the insertion is successful or failed, just return the second in the iterator data corresponding to the insert return value.
V& operator[](const K& key)
{
pair<iterator, bool> ret = _ht.Insert(make_pair(key, V()));
return ret.first->second;
}
Complete code
Hash.h
#pragma once
#include <vector>
template <class K>
struct DefaultHashFunc
{
size_t operator()(const K &key)
{
return (size_t)key;
}
};
// 12:00
template <>
struct DefaultHashFunc<string>
{
size_t operator()(const string &str)
{
// BKDR
size_t hash = 0;
for (auto ch : str)
{
hash *= 131;
hash += ch;
}
return hash;
}
};
template <class T>
struct HashNode
{
T _data;
HashNode<T> *_next;
HashNode(const T &data)
: _data(data), _next(nullptr)
{
}
};
template <class K, class T, class KeyOfT, class HashFunc>
class HashTable;
template <class K, class T, class Ref, class Ptr, class KeyOfT, class HashFunc>
struct __HashIterator
{
typedef HashNode<T> Node;
typedef __HashIterator<K, T, Ref, Ptr, KeyOfT, HashFunc> self;
typedef __HashIterator<K, T, T&, T*, KeyOfT, HashFunc> Iterator;
//构造函数
__HashIterator(Node* node, const HashTable<K, T, KeyOfT, HashFunc>* pht)
{
_node = node;
_pht = pht;
}
__HashIterator(const Iterator& it)
:_node(it._node)
,_pht(it._pht)
{
}
self operator++()
{
if(_node->_next) //当前的哈希桶还有节点
{
_node = _node->_next;
return *this;
}
else
{
KeyOfT kot;
HashFunc hf;
size_t hashi = hf(kot(_node->_data)) % _pht->_table.size();
++hashi;
while(hashi < _pht->_table.size())
{
if(_pht->_table[hashi])
{
_node = _pht->_table[hashi];
return *this;
}
else
{
++hashi;
}
}
_node = nullptr;
return *this;
}
}
bool operator!=(const self& s)
{
return _node != s._node;
}
Ref operator*()
{
return _node->_data;
}
Ptr operator->()
{
return &_node->_data;
}
const HashTable<K, T, KeyOfT, HashFunc>* _pht;
Node* _node;
};
template <class K, class T, class KeyOfT, class HashFunc = DefaultHashFunc<K>>
class HashTable
{
public:
typedef HashNode<T> Node;
typedef __HashIterator<K, T, T&, T*, KeyOfT, HashFunc> iterator;
typedef __HashIterator<K, T, const T&, const T*, KeyOfT, HashFunc> const_iterator;
private:
template <class U, class Q, class W, class E, class Y, class I>
friend struct __HashIterator;
public:
iterator begin()
{
for(size_t i = 0; i < _table.size(); i++)
{
Node* cur = _table[i];
if(cur) return iterator(cur, this);
}
return iterator(nullptr, this);
}
iterator end()
{
return iterator(nullptr, this);
}
const_iterator begin() const
{
for(size_t i = 0; i < _table.size(); i++)
{
Node* cur = _table[i];
if(cur) return const_iterator(cur, this);
}
return const_iterator(nullptr, this);
}
const_iterator end() const
{
return const_iterator(nullptr, this);
}
HashTable()
{
_table.resize(10, nullptr);
}
~HashTable()
{
for (size_t i = 0; i < _table.size(); i++)
{
Node *cur = _table[i];
while (cur)
{
Node *next = cur->_next;
delete cur;
cur = next;
}
_table[i] = nullptr;
}
}
pair<iterator, bool> Insert(const T &data)
{
KeyOfT kot;
iterator it = Find(kot(data));
if(it != end())
{
return make_pair(it, false);
}
HashFunc hf;
// 负载因子到1就扩容
if (_n == _table.size())
{
// 16:03继续
size_t newSize = _table.size() * 2;
vector<Node *> newTable;
newTable.resize(newSize, nullptr);
// 遍历旧表,顺手牵羊,把节点牵下来挂到新表
for (size_t i = 0; i < _table.size(); i++)
{
Node *cur = _table[i];
while (cur)
{
Node *next = cur->_next;
// 头插到新表
size_t hashi = hf(kot(cur->_data)) % newSize;
cur->_next = newTable[hashi];
newTable[hashi] = cur;
cur = next;
}
_table[i] = nullptr;
}
_table.swap(newTable);
}
size_t hashi = hf(kot(data)) % _table.size();
// 头插
Node *newnode = new Node(data);
newnode->_next = _table[hashi];
_table[hashi] = newnode;
++_n;
return make_pair(iterator(newnode, this), true);
}
iterator Find(const K &key)
{
HashFunc hf;
KeyOfT kot;
size_t hashi = hf(key) % _table.size();
Node *cur = _table[hashi];
while (cur)
{
if (kot(cur->_data) == key)
{
return iterator(cur, this);
}
cur = cur->_next;
}
return iterator(nullptr, this);
}
bool Erase(const K &key)
{
HashFunc hf;
KeyOfT kot;
size_t hashi = hf(key) % _table.size();
Node *prev = nullptr;
Node *cur = _table[hashi];
while (cur)
{
if (kot(cur->_data) == key)
{
if (prev == nullptr)
{
_table[hashi] = cur->_next;
}
else
{
prev->_next = cur->_next;
}
delete cur;
return true;
}
prev = cur;
cur = cur->_next;
}
return false;
}
void Print()
{
for (size_t i = 0; i < _table.size(); i++)
{
printf("[%d]->", i);
Node *cur = _table[i];
while (cur)
{
cout << cur->_kv.first << ":" << cur->_kv.second << "->";
cur = cur->_next;
}
printf("NULL\n");
}
cout << endl;
}
private:
vector<Node *> _table; // 指针数组
size_t _n = 0; // 存储了多少个有效数据
};
unodered_map.h
#pragma once
#include "Hash.h"
template<class K, class V>
class unordered_map
{
public:
struct MapKeyOfT
{
const K& operator()(const pair<K, V>& kv)
{
return kv.first;
}
};
typedef typename HashTable<K, pair<const K, V>, MapKeyOfT>::iterator iterator;
typedef typename HashTable<K, pair<const K, V>, MapKeyOfT>::const_iterator const_iterator;
pair<iterator, bool> insert(const pair<K, V>& kv)
{
return _ht.Insert(kv);
}
iterator begin()
{
return _ht.begin();
}
iterator end()
{
return _ht.end();
}
V& operator[](const K& key)
{
pair<iterator, bool> ret = _ht.Insert(make_pair(key, V()));
return ret.first->second;
}
private:
HashTable<K, pair<const K, V>, MapKeyOfT> _ht;
};
unordered_set.h
#pragma once
#include"Hash.h"
template<class K>
class unordered_set
{
public:
struct SetKeyOfT
{
const K& operator()(const K& key)
{
return key;
}
};
typedef typename HashTable<K, K, SetKeyOfT>::const_iterator iterator;
typedef typename HashTable<K, K, SetKeyOfT>::const_iterator const_iterator;
pair<iterator, bool> insert(const K& key)
{
pair<typename HashTable<K, K, SetKeyOfT>::iterator, bool> ret = _ht.Insert(key);
return pair<iterator, bool>(ret.first, ret.second);
}
iterator begin() const
{
return _ht.begin();
}
iterator end() const
{
return _ht.end();
}
private:
HashTable<K, K, SetKeyOfT> _ht;
};