探索之路：HashMap

HashMap是中高级开发工程师必备的知识，无论是求职面试的道路上还是实战运用当中，无处不在。

HashMap的数据结构

首先我们来了解下JAVA的数据结构：数组和链表。

数组：有序的元素序列。主要有下标和元素组成。每个下标对应着元素。通过下标可以快速定位，其时间复杂度O(1),通过固定值查找，需要逐一比对，其时间复杂度为O(n)，当然也要看是否有序数组，如果是则用二分查找等查找方法，其时间复杂度为O(logn)但要删除或新增一个元素，需要移动对应下标地址，其时间复杂度为O(n)。

链表：一种物理存储单元上非连续、非循序的存储结构，其逻辑顺序是通过链表中的指针链接次序实现的。查找一个元素，需要遍历整个链表，所以查找循序慢，其时间复杂度为O(n)，但新增删除比较快，删除元素，只需要将指针指向下一个位置，其时间复杂度为O(1)。

HashMap主要是数组+链表的数据结构。这种结构主要是为了解决根据key计算hashCode相同而冲突的设计的。Hash的解决冲突方法有：1.开放地址法、链地址法、再哈希法、建立公共溢出区。而HashMap采用的是链地址法：将所有相同的哈希地址相同的记录都链接到同一个链表中。

HashMap的工作原理

HashMap的主干主要是Entry[]，map的内容都保存到了Entry

static class Entry<K,V> implements Map.Entry<K,V> {
      final K key;// Key-value结构的key,即键
      V value;//存储值，即值
      Entry<K,V> next;//指向下一个链表节点
      final int hash;//哈希值
      /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        } 
}

注意：JDK1.8对HashMap做了优化，Entry改成了Node，即红黑树（又称平衡二叉树），源码如下：

/**
 * Basic hash bin node, used for most entries.  (See below for
 * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
 */
static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }

    public final int hashCode() {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

这里写图片描述
HashMap基于Hash原理，上图是HashMap的结构图。数组是HashMap的主体，链表就是用来解决冲突的。如果定位到的数组位置不含链表，即next指向null，那么查询和添加操作就很快，时间复杂度为O(1),只需要一次寻址。如果包含链表，对于添加操作的时间复杂度也是O(1)，这个是因为最新的Entry会插入到链表头部，只是改变下引用链便可，但对于操作来讲，此时就需要遍历链表，然后通过key对象的equal方法逐一比较，其时间复杂度是O(1)。

现在来看看HashMap的几个默认值

这里写图片描述

HashMap的构造器方法（以下来源于jdk1.8）：

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and load factor.
 *
 * @param  initialCapacity the initial capacity
 * @param  loadFactor      the load factor
 * @throws IllegalArgumentException if the initial capacity is negative
 *         or the load factor is nonpositive
 */
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity); // 这个是1.8才有的，1.7用下面两个代替
    /*threshold = initialCapacity;
　　　　//init方法在HashMap中没有实际实现，不过在其子类如 linkedHashMap中就会有对应实现　
        init();
        */
}
/**
     * Returns a power of two size for the given target capacity.
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and the default load factor (0.75).
 *
 * @param  initialCapacity the initial capacity.
 * @throws IllegalArgumentException if the initial capacity is negative.
 */
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

/**
 * Constructs an empty <tt>HashMap</tt> with the default initial capacity
 * (16) and the default load factor (0.75).
 */
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

/**
 * Constructs a new <tt>HashMap</tt> with the same mappings as the
 * specified <tt>Map</tt>.  The <tt>HashMap</tt> is created with
 * default load factor (0.75) and an initial capacity sufficient to
 * hold the mappings in the specified <tt>Map</tt>.
 *
 * @param   m the map whose mappings are to be placed in this map
 * @throws  NullPointerException if the specified map is null
 */
public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}

从A源码中可以看出，数组Table分配内存空间并不再构造器中实现，当然D源码，即构造函数传的是一个Map对象，会分配空间。因为其的空间实在put方法实现的。

以下是JDK1.7的方法：

public V put(K key, V value) {
//如果table数组为空数组{}，进行数组填充（为table分配实际内存空间），入参为threshold，此时threshold为initialCapacity 默认是1<<4(24=16)
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
   //如果key为null，存储位置为table[0]或table[0]的冲突链上
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key);//对key的hashcode进一步计算，确保散列均匀
    int i = indexFor(hash, table.length);//获取在table中的实际位置
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
    //如果该对应数据已存在，执行覆盖操作。用新value替换旧value，并返回旧value
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
    modCount++;//保证并发访问时，若HashMap内部结构发生变化，快速响应失败
    addEntry(hash, key, value, i);//新增一个entry
    return null;
}

以下是JDK1.8源码，区别和1.7是非常大的，1.7 rehash的时候，旧链表迁移新链表的时候，如果在新的数组索引位置相同的时候，用的倒置方式，而1.8使用红黑树

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

/**
 * Implements Map.put and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)//如果table[i]为空，则进行扩容
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)//根据键key计算hash值进行得到数组下标i，其值如果为空，则创建一个新的Node
        tab[i] = newNode(hash, key, value, null);// 
    else {//否则，
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))// hash和key都要判断存放的key是否相同，如果相同，则覆盖之前的旧值
            e = p;
        else if (p instanceof TreeNode)// 如果table[i]是红黑树，则在树中插入值
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {// 如果不是，先判断链表长度是否大于8，要是大于8，就把链表转换为红黑树，并执行插入操作
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;//保证并发访问时，若HashMap内部结构发生变化，快速响应失败
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

HashMap的数组长度

其长度始终是2的次幂。其1.7的inflateTable方法或1.8的resize方法,其都是通过位移运算的。而且

threshold取capacity*loadFactor和MAXIMUM_CAPACITY+1的最小值，capaticy一定不会超过MAXIMUM_CAPACITY，除非loadFactor大于1

一下是jdk1.7的源码：

private void inflateTable(int toSize) {
   int capacity = roundUpToPowerOf2(toSize);//capacity一定是2的次幂
    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    table = new Entry[capacity];
    initHashSeedAsNeeded(capacity);
}

private static int roundUpToPowerOf2(int number) {
        // assert number >= 0 : "number must be non-negative";
        return number >= MAXIMUM_CAPACITY
                ? MAXIMUM_CAPACITY
                : (number > 1) ? Integer.highestOneBit((number - 1) << 1) : 1;
    }

从这个可以看出，HashMap的扩容时，用为了位移运算，而且highestOneBit的意思取的高位1的值，那么这个结果永远是2的次幂。比如5，那么二进制：

0000 0101

0001 0101

取高位1，那么就是2的4次方，即8.

那为什么HashMap容量一定要为2的幂？

目的就是为了让HashMap的元素存放更均匀。最理想的状态是，每个Entry数组位置都只有一个位置，即next没有值，也就是内有单链表，这样这样查询效率高，不用遍历单链表，更不用去用equals比较K。一般考虑分布均匀，都会用到%（取模），哈希值%容量=bucketIndex。SUN的大神们的做法参考一下代码：

JDK1.7

/**
 * Returns index for hash code h. 
 */
static int indexFor(int h, int length) {
    return h & (length-1);// h是通过K的hashCode最终计算出来的哈希值，并不是hashCode本身。length是目前的容量。
}

JDK1.8

/**
 * Returns a power of two size for the given target capacity.
 */
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

从这源码看出,无论是1.7还是1.8，再次运用位运算。h是通过K的hashCode最终计算出来的哈希值，并不是hashCode本身。length是目前的容量。当容量一定是2^n时，h&(lenght-1) == h%length,这两个是等价不等效的，位运算是给计算运算的，效率非常高，不是给我们人运算的，我们都是用十进制，否则没有很深的数学功底，是很难理解的。先介绍下二进制计算：

2^n转换为二进制是1+n个0，减1后是0+n个1。比如16=2^4=10000,15=16-1=2^4-1=01111。

&运算: 都为1时候，结果为1，

回归HashMap：那么如果h为16，h&(16-1)的结果肯定大于等于0，小于等于15。如果h<=15,那么与01111进行&运算的结果就是h的本身，如果h>15,那么计算的结果取决于h的后四位位运算，这个结果就是h%length的结果。

由于&的运算，任何数字与1进行&运算，其结果都取决与任何数，如果和0进行&运算，其结果都是0，故而从概率来说，和1计算的相同值概率是50%，与0计算的值100%都是0。所以length-1的长度为2^n-1最好，即容量length为2的次幂最合适。