Summarize the hashmap implementation methods of common languages

CSharp

It is called Dictionary in c#,
an int array bucket + an Entry array structure implementation
There is also a free linked list to store the removed nodes

size : length calculation: take the first prime number greater than the input length

memory usage

bucket: an int array with a length of size
entries: an Entry array
Entry: Take the key value as int as an example: 8 + 4 + 4 = 16B
private struct Entry { public int hashCode ; // Lower 31 bits of hash code, - 1 if unused public int next ; // Index of next entry, -1 if last public TKey key ; // Key of entry public TValue value ; // Value of entry } The highest bit of hashCode is used to mark whether this node is used. count: int, record to apply for the number of nodes version: int, the number of value modifications, used to judge whether there is a behavior to modify the hashmap when traversing freelist: int, the head of the free node list, and the deleted nodes will be linked to the change list freeCount: int, the number of free nodes

大小：size * 4 + size * 16 + 4 + 4 + 4 + 4 = 16 + size * 20

insert

Calculate the hash value of the key, and take the last 31 bits: hash(key) & 0x7FFFFFFF; take the
remainder of the hash value to the size to get the index, indicating which bucket it is in Each bucket
is a linked list with the same index
Traversing the linked list, comparing the hashcode and key (Compare the hashcode first, the integer is faster), if there is the same, modify the value, increase the verison value, and return directly without inserting. Otherwise, record the length of the linked list collisionCount
. If there is a free node, the index of the node to be allocated is taken from the head of the free list. Otherwise, the index of the node to be allocated is count. If the count is not enough, resize, and the length is the next prime number twice the current count; set entries
[ index]’s hashcode, key, next, and value, insert the corresponding bucket header
collisionCount as the number of collisions, when the length of the linked list exceeds 100, rehash is performed, the data length remains unchanged, and the hash calculation method is updated

delete remove

Security judgment, key and bucket must not be empty
Same as insert, the
bucket where the calculation is located traverses the bucket, finds the node with the same hashcode and key, deletes the node from the bucket, and adds the node index to the freelist
to reset the node data to be deleted: hashcode is set to - 1. Set the key value to the default value

resize

According to the size, reallocate the space, recalculate the hash (optional), and put it in a different bucket

traverse

Traverse entries with initial index=0
, the hashcode of entries[index] >= 0 means it has value

C++

The implementation method called unordered_map in c++
is the same as C#, but different:

When the length of the linked list corresponding to each index exceeds 8, it will be converted into a red-black tree to ensure search efficiency
The hash index adopts bit operation, and the bucket size is a power of 2, which can improve the operation speed

Prime numbers are powers of 2

Regarding whether to use a prime number or a power of 2 as a mod, there are the following thoughts:

If the data is evenly distributed, any number can be used by mod, and it is better to choose a power of 2
If the data distribution is uneven, using prime numbers will give a more even distribution than using non-prime numbers

If the hash function is good enough or the data distribution is even, using a power of 2 is a better solution
and has a lot to do with data distribution. For example, your keys are all integers, which represent memory addresses. Because memory addresses are 4-byte aligned, your key They may all be multiples of 4, causing the data to be piled up in hash buckets that are multiples of 4. At this time, it will be better to choose prime numbers.

Lua

Lua does not use buckets, and directly uses entries as index arrays

insert

Calculate the index corresponding to the hash, the calculation method is the same as c++
If entries[index]empty, insert directly
Otherwise, see entries[index]if the index of is the same, the same means that it is on a collision chain, take a node from the freenode, move the freenode forward, and insert it at the end
If it is different, entries[index]remove it, insert the data to be inserted, and take the node from freenode and put it entries[index].
The freenode resizes when it moves to the starting position. freenode only decreases but not increases

Others are not bad

Summarize

The lua solution is more space-saving, but the freenode will only decrease, even if there are empty nodes behind, it will not be allocated, and it will only follow the rehash logic. In addition, after the filling rate is high, it will be time-consuming to go to step 4.
The c++ solution should be the best. The linked list will be converted to a red-black tree to ensure search efficiency (converting to a red-black tree has overhead and implementation is more troublesome). If the hash function is better, the efficiency of a power of 2 is also higher.

Hashmap implementation method of common language

CSharp

memory usage

insert

delete remove

resize

traverse

C++

Prime numbers are powers of 2

Lua

insert

Others are not bad

Summarize

Guess you like