Thoroughly understand the underlying principles of HashMap

HashMapIt is definitely the most frequently used and one of the most frequently asked collections in interviews. Only by knowing all the points thoroughly can you be confident and comfortable when you face the big factory. Then it takes you slowly uncover HashMapthe veil.

1. Ask a question

The best way to learn a knowledge point is to learn with questions. Then we first throw out a few common questions in interviews, and then we will analyze HashMapthe principles of these questions bit by bit .

  • The underlying data structure of HashMap?
  • What is the difference between Java7 and Java8?
  • Why is thread unsafe?
  • Is there any thread-safe class instead?
  • What is the default initialization size? Why are there so many? Why is the size a power of 2?
  • How to expand HashMap? What is the load factor? Why is there so much?
  • How does HashMap handle hash collisions?
  • Hash calculation rules?

2. The underlying data structure of HashMap

HashMapThe underlying data structure of is mainly in the form of 数组+ 链表, JDK8which will also be used in 红黑树. The specific structure is shown in the figure below:Insert picture description here

Why does HasMap use the data structure of array + linked list? Why is the red-black tree introduced in JDK8?

1. Why use arrays.

We know that the advantage of arrays is that the corresponding elements can be quickly found based on the subscript. In the HashMapcan according to keythe hashCodecalculated value, where the subscript of an array, is faster to the location of the node is located.

2. Why do you need a linked list?

JavaThe hashCodetype is intthe type, which is the range . With such a large range, it is impossible to use it directly. Then you need to use HashCode and the length of the array to do an AND operation to get a position that can appear in the array. If two elements get the same , then two values ​​are stored under this array . Different values ​​exist in the same position of the array and cannot be overwritten. The speed advantage of inserting and deleting the linked list is faster, thus forming a linked list structure.-232~231 (-2147483648 ~ 2147483647)indexindex

In this way, the combination of the array and the linked list increases the search speed, as well as the speed of adding and deleting.

3. Why was the red-black tree introduced after JDK8?

First of all, let's take a look at the performance comparison between the linked list and the red-black tree, as shown below:

  • Linked list: insertion complexity O(1), search complexity O(n)
  • Red-black tree: insertion complexity O(logn), search complexity O(logn)
  • HashMapWhen the array element is a linked list, the insertion directly uses the header insertion, and the insertion complexity is O(1) ; when the linked list is short, there is no impact on the performance when looking up the data. If the linked list is long, the search will greatly affect the performance.
  • In Java8, if the length of the array and the linked list reaches a certain length, it will be converted into a red-black tree, which improves the performance of the search, but every time new data is inserted, the structure of the red-black tree must be maintained, and the complexity is O(logn ) . This can be regarded as a trade-off of performance when searching and inserting elements, after all, it is stored for searching.

4. When will the linked list be converted to a red-black tree

Read many blog posts are said to be in 链表the length reached 8after one, 链表it will be converted to 红黑树. In fact, this statement is not entirely correct . Candidates for admission to say that when 数组the length is greater than 64, and 链表the length reaches 8after one until conversion to 红黑树. The code to convert the red-black tree
HashMapin the putVal()middle (shown below),
Insert picture description here
most people may have seen the code in the red box above and said that when the length of the linked list is greater than 8, it will convert the red-black tree. So, we are looking at treeifyBinthe code of the method. As shown below: We see
Insert picture description here
through treeifyBinthe source code. When the length of the array ( tab.length) is less than MIN_TREEIFY_CAPACITY, the resize()method is called for expansion.

3. Initialization capacity and load factor

We use HashMap, the habitual use may new HashMap();create. This case HashMapthe default size of DEFAULT_INITIAL_CAPACITY = 1 << 4;which is 16. So, if the length passed in when we created it is 17(ie: new HashMap(17);), HashMaphow to deal with it?

3.1, find the minimum value of the power of 2

In the initialization of HashMap, there is such a method;

public HashMap(int initialCapacity, float loadFactor) {
    
    
        ...
        this.loadFactor = loadFactor; // 负载因子
        // 关键点:
        this.threshold = tableSizeFor(initialCapacity);
    }
  • The threshold threshold, tableSizeForcalculated by the method , is calculated according to the initialization.
  • This method is to find the smallest value to the nth power of 2 that is larger than the initial value. For example, if 17 is passed, the value is 32.

Method of calculating the size of the threshold;

    static final int tableSizeFor(int cap) {
    
    
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
  • MAXIMUM_CAPACITY = 1 << 30, this is the critical range, which is the largest Map collection.
  • At first glance, it may seem a little dizzy. Why are all shifting 1, 2, 4, 8, and 16 to the right. This is mainly to fill in each position of the binary with 1. When each position of the binary is 1, it becomes a standard The multiple of 2 is subtracted by 1, and finally the result is increased by 1 and then returned.

We demonstrate the number 17 as follows:
Insert picture description here

Why do I need to power 2 side, check out why HashMap initial value of the n-th power of 2?

3.2, load factor

static final float DEFAULT_LOAD_FACTOR = 0.75f;

And the load factor is related to the expansion, that is, when HashMapthe time reaches the number of elements in a certain threshold, the current need for expansion vessel.
So why is it set up like this? As mentioned above HashMap, the data is stored internally in the data structure of an array plus a linked list or a red-black tree. When we store the data, we hashhash it in the form of calculating the subscript of the array by value. There may be more than one in the same position of the array. Elements. Regardless of whether it is a linked list or a red-black tree, when the number of elements in it is large, its search, insertion, and deletion will slow down. HashMapThe function is hashing. Then you can increase the hash degree by expanding the capacity to make the linked list or red. The number of elements in the black tree is reduced, which improves performance.

  • Therefore, it is necessary to choose a reasonable size for expansion. The default value of 0.75 means that when the threshold capacity occupies 3/4s, expand the capacity quickly to reduce Hash collisions.
  • At the same time, 0.75 is a default structure value, which can also be adjusted when creating a HashMap. For example, if you want to use more space in exchange for time, you can adjust the load factor to a smaller value to reduce collisions.

Four, hash value calculation rules

Let's first look at the source code of the HashMapcalculated hashvalue, as follows:

    static final int hash(Object key) {
    
    
        int h;
         // 计算hash 无符号右移 16位,是为了 高位参与运送
        // 减少 hash 冲突。
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

We can see that the key part is (h = key.hashCode()) ^ (h >>> 16). Shift the hash value to the right by 16 bits, which is exactly half of its length, and then perform an XOR operation with the original hash value, thus mixing the high and low bits of the original hash value, increasing the randomness .
For example:
For example there are two keyof hashCoderespectively 7C3B0000,1C3C0000 (hexadecimal), HashMaplength of the array 16. These two are obviously different, but after taking the modulus, they are both 0, and there is a conflict. If you also let the high bits participate in the calculation, the results will be different, as shown below:
Insert picture description here
As can be seen from the above figure, the subscripts after the high bits participate in the calculation have become 11 and 12 respectively, reducing conflicts.

Five, why thread is not safe

Thread insecurity is mainly reflected in the following aspects:

  1. In JDK7, when the capacity is expanded, it is easy to cause an endless loop.
  2. Cause data loss.
  3. HashMap obviously has a value, but it returns null when get.

For details, please refer to: Why is HashMap not thread-safe?

Six, alternative thread-safe classes

1. HashTable.
HashTableIt is thread-safe Map, but its interior is by synchronizedmutex thread lock to achieve. Performance is low.

2, Collections
use Collectionsprovided synchronizedMapto build a thread class security method, which is also through the internal synchronizedlocking threads to achieve mutual exclusion. Performance is low.

3. ConcurrentHashMap
ConcurrentHashMapis also locked by means of locking, but it is locked by segmented locking, and the performance is much higher than the previous two.

Seven, the difference between Java7 and Java8

Java7 The Java8main differences between and later versions are:

  1. Data structure: Java7 uses an array + linked list data structure, while Java 8 uses an array + linked list and red-black tree data structure.
  2. Insertion method: JDK1.7 uses the head insertion method, while JDK1.8 and later use the tail insertion method, so why do they do this? Because JDK1.7 is a longitudinal extension with a singly-linked list, when the head insertion method is used, the problem of reverse order and endless loop of the circular linked list is prone to occur. But after JDK1.8, it is because of the addition of the red-black tree to use the tail interpolation method, which can avoid the problem of reverse order and endless loop of the linked list.
  3. After the expansion, the calculation method of the data storage location is different. In JDK1.7, the hash value and the binary number that need to be expanded are used directly for &. In JDK1.8, the calculation rule when JDK1.7 is used directly, that is, the original position before expansion + the value of expansion = the calculation method of JDK1.8, instead of the difference of JDK1.7. Or the method. But this method is equivalent to just judging whether the newly added bit of the hash value involved in the operation is 0 or 1, and then directly and quickly calculating the storage method after expansion.

Reference:
https://blog.csdn.net/qq_36520235/article/details/82417949
https://aobing.blog.csdn.net/article/details/103467732
https://bugstack.blog.csdn.net/article/ details/107903915

Guess you like

Origin blog.csdn.net/small_love/article/details/112528723