HashMap底层原理与面试回答

特点：增，删，查的时间复杂度都是O(1)。

存储KEY可以为Null且唯一，Value可为NULL；扩容（初始大小为16（2的次幂），负载因子为0.75，扩容至原大小的2倍）；

概念：扰动函数（hash函数）：使用hash函数，是避免使用较差的hashCode();从而减少hash碰撞的发生。

JDK1.7

数据结构：数组加链表（单向）；线程不安全、多线程下会发生死循环；

数组加链表的结构：HashMap中的内部类Entry组成的数组；计算数组下标：自身实现的hash()算法，并做异或运算再取模得到对应的数组下标位置；自写hash算法的好处在于，减少了hash碰撞的发生。

代码如下：key的hashCode值，9次扰动处理=4次位运算+5次异或。

final int hash(Object k) {
    int h = hashSeed;
    if (0 != h && k instanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }

    h ^= k.hashCode();

    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

链表采用头插法，扩容过程中，多线程并发的情况下：transfer()方法，循环旧链表，再重新指向的新的表头时，一线程挂起一线程继续执行，最终会导致两线程插入数据指向彼此，形成环形链表，在get该数组中数据时，就会导致死循环。

void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        while(null != e) {
        // 如多线程执行时，某一线程执行到该步骤挂起，则会导致上述所说的环形链表的产生。
            Entry<K,V> next = e.next; 
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            int i = indexFor(e.hash, newCapacity);
            e.next = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
}

链表结构除了头插法引起的死循环还有一个严重问题，当hash碰撞大量发生，导致某一Entry中存放大量Node，由于单向链表，在数据查找时（顺序查找），极大的影响查找效率。

JDK1.8

数据结构：数组加链表加红黑树；线程不安全、尾插法不会陷入死循环；

key的hashCode值，2次扰动处理=1次位运算+1次异或。

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

链表采用尾插法，不会导致死循环的发生，但是在多线程的情况下，可能会导致部分数据的丢失。

链表转化为红黑树的阈值是8:

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 */
static final int TREEIFY_THRESHOLD = 8;

/**
 * The bin count threshold for untreeifying a (split) bin during a
 * resize operation. Should be less than TREEIFY_THRESHOLD, and at
 * most 6 to mesh with shrinkage detection under removal.
 */
static final int UNTREEIFY_THRESHOLD = 6;

...

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
  ...
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash); //转化为红黑树
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        ...
}

问题，为什么在小于7，转化为链表，大于7转化为红黑树。

首先，红黑树不一定查询就比链表高效，当节点很多时，红黑树的效率较高；选择6和8（如果链表小于等于6树还原转为链表，大于等于8转为树），中间有个差值7可以有效防止链表和树频繁转换；容器中节点分布在hash桶中的频率遵循泊松分布，桶的长度超过8的概率非常非常小。

浮云浮影

发布了27 篇原创文章 · 获赞 0 · 访问量 9932

私信关注

HashMap底层原理与面试回答

猜你喜欢