Analysis of the underlying implementation principle of HashMap

1. Implementation principles of put() and get() in HashMap:

1. Implementation principle of map.put(k,v)

  • First, encapsulate k and v into Node objects (nodes).
  • Then, its bottom layer will call K's hashCode() method to obtain the hash value.
  • Finally, the hash value is converted into the subscript of the array through the hash table function/hash algorithm. If there is no element at the subscript position, the Node is added to this position. If there is a linked list at the position corresponding to the subscript. At this time, k will be equal to the k of each node on the linked list. If all equals methods return false, then the new node will be added to the end of the linked list. If one of the equals returns true, the value of this node will be overwritten.

2. Implementation principle of map.get(k)

  • First call the hashCode() method of k to get the hash value, and convert it into the subscript of the array through the hash algorithm.
  • After converting it into an array subscript through the hash algorithm in the previous step, it can quickly locate a certain position through the array subscript. If there is nothing at this location, null is returned. If there is a one-way linked list at this position, then it will take K and the K of each node on the one-way linked list to perform equals. If all equals methods return false, the get method returns null. If the K of one of the nodes and the parameter K perform equals and return true, then the value of the node is the value we are looking for, and the get method finally returns the value we are looking for.

common problem:

  • HashMap additions and deletions are completed on the linked list, and queries only need to scan part, so random additions and deletions and query efficiency are very high
  • The key of the HashMap collection will call two methods successively, the hashCode and equals methods. Do these two methods need to be rewritten? Because the equals method compares the memory addresses of two objects by default

2. Analysis of HashMap red-black tree principle

Compared with the HashMap of jdk1.7, the most important thing about jdk1.8 is the introduction of the red-black tree design. Except for the insertion operation, the red-black tree is faster than the linked list. When the length of a single linked list of the hash table exceeds 8, When the array length is greater than 64, the linked list structure will be converted into a red-black tree structure. When the number of nodes on the red-black tree is less than 6, the red-black tree will be converted into a one-way linked list data structure.
Why is it designed this way? The advantage is to avoid the linked list from becoming very long in the most extreme case, which will cause very slow efficiency during querying.

Red-black tree query: its access performance is similar to half search, and the time complexity is O(logn);
linked list query: In this case, all elements need to be traversed, and the time complexity is O(n);
simply speaking, red-black The tree is an approximately balanced binary search tree. Its main advantage is "balance", that is, the height of the left and right subtrees is almost the same, which prevents the tree from degenerating into a linked list. In this way, the time complexity of the search is guaranteed to be log(n).

 

Several main characteristics of red-black trees:

  • Each node is either red or black, but the root node is always black;
  • The two child nodes of each red node must be black;
  • The red node cannot be continuous (that is, neither the child nor the father of the red node can be red);
  • The path from any node to every leaf node in its subtree contains the same number of black nodes;
  • All leaf nodes are black (note that the leaf nodes here are actually NIL nodes in the above figure); when the structure of the tree changes (insertion or deletion operation), the above condition 3 or condition 4 is often destroyed, and it is necessary By adjusting, the search tree can again meet the conditions of a red-black tree.

3. The difference between HashMap principles 1.7 and 1.8

The bottom layer in jdk1.7 is implemented by array + linked list; the bottom layer in jdk1.8 is implemented by array + linked list/red-black tree.
Null keys and null values ​​can be stored. The initial size is thread-unsafe.
The initial size is 16. Expansion: newsize = oldsize*2 , the size must be the n power of 2.
The expansion is for the entire Map. During each expansion, the storage locations of the elements in the original array are recalculated in turn and re-inserted.
When the total number of elements in the Map exceeds 75% of the Entry array, the expansion operation is triggered. In order to Reduce the length of the linked list and distribute elements more evenly
 

4. Hash conflict

When two keys have the same hashCod calculation (in fact, the hashCode is randomly generated, it is possible that the hashCode is the same), a hash conflict occurs.

Methods to solve hash conflicts include: open addressing method, re-hash method, chain address method, and establishing a public overflow area
HashMap. The way to solve hash conflicts is to use a linked list. When a hash conflict occurs, the Entry stored in the array is set to the next of the new value. To put it bluntly, for example, A and B are both hashed and mapped to the subscript i. There was already A before. When map.put(B ), put B in the subscript i, and A is the next of B, so the new value is stored in the array, and the old value is on the linked list of the new value.

Open addressing method: When the hash address p=H (key) of the keyword key conflicts, another hash address p1 is generated based on p. If p1 still conflicts, another hash address p1 is generated based on p. Hash address p2,..., until a non-conflicting hash address pi is found, and the corresponding element is stored in it

Re-hash method: Construct multiple different hash functions at the same time. When the hash address Hi=RH1 (key) conflicts, then calculate Hi=RH2 (key)... until conflicts no longer occur.

Chain address method: The basic idea of ​​this method is to form a singly linked list called a synonym chain with all elements with hash address i, and store the head pointer of the singly linked list in the i-th unit of the hash table, so the search , insertion and deletion are mainly performed in synonym chains. The chain address method is suitable for situations where insertions and deletions are frequent.

Establish a public overflow area: Divide the hash table into two parts: the basic table and the overflow table. All elements that conflict with the basic table will be filled in the overflow table.
 

Guess you like

Origin blog.csdn.net/ddwangbin520/article/details/131531226