The realization of the underlying principle of HashMap (1)

 

 

 

 

 

 

 

 

————————————

 

 

 

 

 

 

 

 

 

 

 

 

 

 

As we all know, HashMap is a collection used to store Key-Value key-value pairs, each key-value pair is also called Entry . These key-value pairs (Entry) are scattered and stored in an array, which is the backbone of HashMap.

 

The initial value of each element of the HashMap array is Null.

 

 

 

 

For HashMap, we most often use two methods: Get  and  Put .

 

 

1. The principle of the Put method

 

What happens when the Put method is called?

 

For example, call hashMap.put("apple", 0) to insert an element whose key is "apple". At this time, we need to use a hash function to determine the insertion position (index) of the Entry:

 

index =  Hash(“apple”)

 

Assuming that the last calculated index is 2, the result is as follows:

 

 

 

 

However, because the length of HashMap is limited, when more and more entries are inserted, even the perfect Hash function will inevitably have index conflicts. For example the following:

 

 

 

 

What should we do at this time? We can use linked lists to solve this.

 

Each element of the HashMap array is not only an Entry object, but also the head node of a linked list. Each Entry object points to its next Entry node through the Next pointer. When the new Entry is mapped to the conflicting array position, it only needs to be inserted into the corresponding linked list:

 

 

 

It should be noted that when the new Entry node is inserted into the linked list, the "head insertion method" is used. As for why it is not inserted at the end of the linked list, there will be an explanation later.

 

 

2. The principle of the Get method

 

What happens when you use the Get method to find the Value based on the Key?

 

First, do a Hash mapping of the input Key to get the corresponding index:

 

index =  Hash(“apple”)

 

Due to the Hash conflict just mentioned, the same position may match multiple Entry. At this time, it is necessary to search down one by one along the head node of the corresponding linked list. Suppose the Key we are looking for is "apple":

 

 

 

In the first step, we look at the head node Entry6. The Key of Entry6 is banana, which is obviously not the result we are looking for.

 

In the second step, we are looking at the Next node Entry1. The Key of Entry1 is apple, which is the result we are looking for.

 

The reason why Entry6 is placed in the head node is because the inventor of HashMap believes that the later inserted Entry is more likely to be looked up .

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

————————————

 

 

 

 

 

 

 

 

 

 

 

 

As mentioned before, a Hash function is used to map from the Key to the corresponding position of the HashMap array:

 

index =  Hash(“apple”)

 

How to implement a Hash function that is as evenly distributed as possible? We do some kind of operation by using the HashCode value of the Key.

 

 

 

index =  HashCode(Key) % Length ?

 

 

 

 

How to do bit operations? There is the following formula (Length is the length of the HashMap):

 

index =  HashCode(Key) &  (Length - 1) 

 

Below we demonstrate the whole process with the value of "book" Key:

 

1. Calculate the hashcode of the book, the result is 3029737 in decimal and 101110001110101110 1001 in binary.

 

2. Assuming the default HashMap length is 16, the result of calculating Length-1 is 15 in decimal and 1111 in binary.

 

3. Do the AND operation of the above two results , 101110001110101110 1001 & 1111 = 1001, the decimal is 9, so index=9.

 

It can be said that the final index result obtained by the Hash algorithm depends entirely on the last few digits of the Hashcode value of the Key.

 

 

 

 

 

 

 

Assuming that the length of HashMap is 10, repeat the operation steps just now:

 

 

 

 

Looking at this result alone, there is no problem on the surface. Let's try a new HashCode 101110001110101110  1011  again :

 

 

 

 

Let's try another HashCode 101110001110101110  1111  :

 

 

 

Yes, although the penultimate and third digits of HashCode have changed from 0 to 1, the result of the operation is all 1001. That is to say, when the HashMap length is 10, some index results are more likely to appear, and some index results will never appear (such as 0111)!

 

In this way, it is obviously not in line with the principle of uniform distribution of the Hash algorithm.

 

In contrast, the length of 16 or other powers of 2, the value of Length-1 is that all binary bits are all 1. In this case, the result of index is equivalent to the value of the last few bits of HashCode. As long as the input HashCode itself is uniformly distributed, the result of the Hash algorithm is uniform.

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325302904&siteId=291194637