(1) Hashmap design and problems

Get into the habit of writing together! This is the third day of my participation in the "Nuggets Daily New Plan · April Update Challenge", click to view the details of the event .

Java's Hashmap and the corresponding optimization are common knowledge points in interviews. Interviews are for selection. The parts involving logical reasoning are often difficult, and some points can open the gap between people. So I want to reason with you about the key points and difficulties. After the reasoning, I hope that the long-standing problem of hashmap will no longer be difficult.

How is hashmap implemented?

The basic principle of hashmap implementation is very simple:

  • Hash Array Single Chain Bucket
  • table array to bucket

The following classic diagram:

hashmap1.png

put process

  • The put method calculates the key first
  • Overwrite if exists
  • Use addEntry if you are not there
  • First decide whether to expand

Judging expansion

  • Load factor judgment expansion
  • Double capacity expansion bucket
  • Rehashing existing data

put hit bucket

  • New Entry bucket head
  • The original linked list is placed next

null key handling

The put/get process will specially handle the case where the key is empty, and consider that the hash value of null is 0.

get process

get first calculates the hash of the key, locates the bucket, and then traverses the singly linked list. Until an equal key is found, if it is not found after traversing the linked list, the key does not exist, and null is returned.

  • The get method calculates the key first
  • Find buckets to traverse keys
  • until the key waits to return

Red-Black Trees Optimize Search Efficiency

  • 1.8 Add red-black tree
  • The length of the linked list is also efficient
  • Query efficiency LogN

What is the problem? infinite loop

If the hashmap is used in a single thread, it is naturally no problem. If multi-threaded concurrency is introduced, it may be found that the CPU occupies 100% and remains high. By looking at the stack, you will be surprised to find that the thread hangs on the get method of the hashmap, and the problem disappears after the service is restarted.

why is that?

In fact, it is to reason about this question.

如上所述,hashmap使用一个Entry数组来保存key,value数组。存在hash冲突的时候,entry通过链表串起来。 扩容的时候会新建一个更大的数组,并通过transfer方法,移动元素。 移动的逻辑也很清晰,遍历原来table中每个桶的链表,并对每个元素进行重hash,在新的newTable找到归宿。

这是Java 1.7实现的一段代码:

   void resize(int newCapacity) { //传入新的容量
        Entry[] oldTable = table;  //引用扩容前的 Entry 数组
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) { //扩容前的数组大小如果已经达到(2^30)了
            threshold = Integer.MAX_VALUE;  //修改阈值为 int 的最大值(2^31-1),这样以后都不会扩容了
            return;
        }

        Entry[] newTable = new Entry[newCapacity];  //初始化一个新的 Entry 数组
        transfer(newTable); //用来将原先table的元素全部移到newTable里面
        table = newTable;  //再将newTable赋值给table
        threshold = (int)(newCapacity * loadFactor);//修改阈值
    } 
复制代码
   void resize(int newCapacity) { //传入新的容量
        Entry[] oldTable = table;  //引用扩容前的 Entry 数组
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) { //扩容前的数组大小如果已经达到(2^30)了
            threshold = Integer.MAX_VALUE;  //修改阈值为 int 的最大值(2^31-1),这样以后都不会扩容了
            return;
        }

        Entry[] newTable = new Entry[newCapacity];  //初始化一个新的 Entry 数组
        transfer(newTable); //用来将原先table的元素全部移到newTable里面
        table = newTable;  //再将newTable赋值给table
        threshold = (int)(newCapacity * loadFactor);//修改阈值
    } 
复制代码

那为什么这个实现会出现问题呢?

假设此时线程1和线程2 put的时候都发生扩容, 此时线程1的局部变量e和next的值如下.

hashmap2.png

如果CPU时间片切换到线程2,线程2完成搬移,假设a,b,c的哈希还是位于同一个桶,结果如下

hashmap3.png

a,b,c还是位于同一个桶,但是根据上面的代码逻辑,采用头插法,我们可以知道这几个Entry在链表中的顺序和原来相反

此时继续切换到线程1运行,则线程1的newtable会形成环,如下

hashmap4.png

在线程1,e是a,next是b,而因为线程2的搬移b的next是a,则根据以上逻辑,线程1的这个桶的链表里则会出现a,b互指的环。

假设最终线程1的扩容成功。线程1的newTable被重命名为table 此时再get数据,哈希命中这个桶,但如果查找的key上非a或b的,则会出现死循环。

To sum up, an infinite loop occurs, causing the CPU to soar. This is a problem with the implementation of hashmap expansion. In summary, it is:

  • Concurrent move is risky
  • New barrels appear in the chain
  • Take the key without an infinite loop

Problem Solving - Implementation of 1.8

The fundamental reason is that each newly inserted element is placed at the head of the linked list, and when data is moved in sequence, it starts from the head of the queue, so an infinite loop may occur. Java 1.8 has done an optimization, and each newly inserted element is placed at the end of the linked list.

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);//尾插法
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
复制代码

p.next = newNode(hash, key, value, null);//Tail interpolation

Guess you like

Origin juejin.im/post/7086484021873475615