JDK8 HashMap 源码分析

HashMap 是我们在项目中常用的集合类，但是对于它的底层实现我们却不甚了解，例如：

1. HashMap 的内部数据结构是什么？初始容量是多少？

2. HashMap 的 put 方法的过程？

3. HashMap 的 hash 函数是如何实现的？如何解决 hash 冲突？

4. HashMap 哪时扩容？如何扩容？

......

本文将分析 JDK8 HashMap 的源码，详细介绍其底层的实现原理和细节。阅读完本文，你将可以轻松解答以上问题。事不宜迟，现在让我们走进源码的世界。

一、HashMap 的数据结构

HashMap 的数据结构在 JDK 8 版本有了很大的改进：

JDK 7 及之前：数组 + 链表

JDK 8 及至今：数组 + 链表，链表过长时会转成红黑树，红黑树节点过少时会退化成链表

源码：

 1 public class HashMap<K,V> extends AbstractMap<K,V>
 2     implements Map<K,V>, Cloneable, Serializable {
 3     /**
 4      * Basic hash bin node, used for most entries.  (See below for
 5      * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
 6      */
 7     static class Node<K,V> implements Map.Entry<K,V> {
 8         final int hash;
 9         final K key;
10         V value;
11         Node<K,V> next; //单项链表
12 
13         Node(int hash, K key, V value, Node<K,V> next) {
14             this.hash = hash;
15             this.key = key;
16             this.value = value;
17             this.next = next;
18         }
19 
20         //....
21     }
22 
23     /**
24      * The table, initialized on first use, and resized as
25      * necessary. When allocated, length is always a power of two.
26      * (We also tolerate length zero in some operations to allow
27      * bootstrapping mechanics that are currently not needed.)
28      */
29     transient Node<K,V>[] table;
30 
31 }

源码分析：

HashMap 的底层实现是一个 Node<K, V> 数组；

数组元素 Node 中有 hash，key，value，next 四个属性，其中 key 和 value 属性，符合键值对的要求， hash 属性是 HashMap 内部计算元素存储下标所用，next 指向一个 Node<K,V> 对象，表明这是个单项链表的节点

所以验证 HashMap 的结构包含有数组 + 链表，至于 红黑树，它是 JDK 8 版本为了提高搜索效率所做的优化，当链表的节点数量超过一定数值（JDK 现设置为 8）时，链表将会转成红黑树结构，后续内容中会有详细分析。

二、HashMap 的 put 过程

HashMap 的内部结构已经基本了解，那么如果有一个元素到来，它该如何存储，又该存储在哪个位置呢？

接下来我们来看下 HashMap 中 put 方法的实现：

源码：

  1 public class HashMap<K,V> extends AbstractMap<K,V>
  2     implements Map<K,V>, Cloneable, Serializable {
  3     
  4     /**
  5      * The bin count threshold for using a tree rather than list for a
  6      * bin.  Bins are converted to trees when adding an element to a
  7      * bin with at least this many nodes. The value must be greater
  8      * than 2 and should be at least 8 to mesh with assumptions in
  9      * tree removal about conversion back to plain bins upon
 10      * shrinkage.
 11      */
 12     static final int TREEIFY_THRESHOLD = 8; // 树状阈值，当链表节点数大于等于该值时，链表将会转换成红黑树的结构
 13 
 14     transient Node<K,V>[] table;
 15 
 16     /**
 17      * The number of key-value mappings contained in this map.
 18      */
 19     transient int size; // map 中存储 Node 的数量
 20 
 21     /**
 22      * The number of times this HashMap has been structurally modified
 23      * Structural modifications are those that change the number of mappings in
 24      * the HashMap or otherwise modify its internal structure (e.g.,
 25      * rehash).  This field is used to make iterators on Collection-views of
 26      * the HashMap fail-fast.  (See ConcurrentModificationException).
 27      */
 28     transient int modCount; // map 修改的次数
 29 
 30     /**
 31      * The next size value at which to resize (capacity * load factor).
 32      * @serial
 33      */
 34     int threshold; // map 的扩容阈值，即 map 中 Node 的个数 = threshold 时，map 将会进行扩容
 35 
 36     public V put(K key, V value) {
 37         return putVal(hash(key), key, value, false, true);
 38     }
 39 
 40     /**
 41      * Implements Map.put and related methods
 42      *
 43      * @param hash hash for key
 44      * @param key the key
 45      * @param value the value to put
 46      * @param onlyIfAbsent if true, don't change existing value
 47      * @param evict if false, the table is in creation mode.
 48      * @return previous value, or null if none
 49      */
 50     final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
 51                     boolean evict) {
 52         Node<K,V>[] tab; Node<K,V> p; int n, i;
 53         // 判断数组是否初始化
 54         if ((tab = table) == null || (n = tab.length) == 0)
 55             // 初始化 Node<K,V>[] 数组
 56             n = (tab = resize()).length;
 57         // 计算新增元素在数组的下标位置，若该位置为 null，则新增 Node
 58         if ((p = tab[i = (n - 1) & hash]) == null)
 59             tab[i] = newNode(hash, key, value, null);
 60         else {
 61             Node<K,V> e; K k;
 62             // 若数组下标位置已存储 Node 且与新增元素的 key 值相同，则将当前下标元素替换为新增 Node
 63             if (p.hash == hash &&
 64                 ((k = p.key) == key || (key != null && key.equals(k))))
 65                 e = p;
 66             // 若当前下标 Node 的类型为树节点，则交给红黑树处理，这里暂不展开...
 67             else if (p instanceof TreeNode)
 68                 e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
 69             else {
 70                 // for循环，沿着链表往下遍历
 71                 for (int binCount = 0; ; ++binCount) {
 72                     // 遍历到链表尾部，添加新增 Node
 73                     if ((e = p.next) == null) {
 74                         p.next = newNode(hash, key, value, null);
 75                         // 如果链表中 Node 个数 >= 8，则将链表转成红黑树（binCount 从0开始，
 76                         // 所以比较时 TREEIFY_THRESHOLD 需要减 1）
 77                         if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
 78                             treeifyBin(tab, hash);
 79                         break;
 80                     }
 81                     // 如果 key 值相同，跳出循环
 82                     if (e.hash == hash &&
 83                         ((k = e.key) == key || (key != null && key.equals(k))))
 84                         break;
 85                     p = e;
 86                 }
 87             }
 88             // 当前链表节点已存在 Node 且 key 值相同
 89             if (e != null) { // existing mapping for key
 90                 V oldValue = e.value;
 91                 if (!onlyIfAbsent || oldValue == null)
 92                     e.value = value;
 93                 afterNodeAccess(e);
 94                 return oldValue;
 95             }
 96         }
 97         // map 修改次数 +1
 98         ++modCount;
 99         // size 表示 Node 的个数，如果size > threshold [ threshold = Node 数组的长度 * 0.75]，
100         // 则进行扩容操作
101         if (++size > threshold)
102             resize();
103         afterNodeInsertion(evict);
104         return null;
105     }
106 }

源码分析：

HashMap 中 put 方法过程：

1. 对 key 求 hash 值，见代码 37 行；

2. 判断数组是否初始化，如果未初始化则进行 resize，见代码 54 行；

3. 计算元素放置位置下标

a. 没有 hash 冲突，则直接放入数组，若节点已经存在就替换旧值

b. hash 冲突了：

- - 以单向链表的方式连接到最后，如果链表长度超过阀值（TREEIFY_THRESHOLD == 8），就把链表转成红黑树
  - 以红黑树的方式放入节点

若节点已经存在就替换旧值

4. 判断元素个数是否 >= threshold（扩容阈值），如果达到就需要 resize，见代码 101 行.

三、hash 函数

HashMap 内置了 hash 函数对 key-value 键值对中的 key 进行了计算，并通过计算结果确定其存储位置。

源码：

1 public class HashMap<K,V> extends AbstractMap<K,V>
2     implements Map<K,V>, Cloneable, Serializable {
3 
4     static final int hash(Object key) {
5         int h;
6         return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
7     }
8 
9 }

源码分析：

　　HashMap 中 hash 函数实现不单是使用了 key.hashCode() ，还将 hash 值的高 16 bit 和低 16 bit 做了一个异或，这是 JDK 8 版本的一个优化，使得元素在数组中的落点更加均匀；

　　元素存放下标位置：hash(key) & (n - 1)，其中 n 为 HashMap 的容量，见上文 put 源码 58 行。

四、HashMap 的扩容过程

当 HashMap 内的元素到达扩容阈值时，HashMap 将会进行扩容操作。

源码：

  1 public class HashMap<K,V> extends AbstractMap<K,V>
  2     implements Map<K,V>, Cloneable, Serializable {
  3 
  4     transient Node<K,V>[] table;
  5 
  6     static final int hash(Object key) {
  7         int h;
  8         return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
  9     }
 10 
 11     /**
 12      * The default initial capacity - MUST be a power of two.
 13      */
 14     static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
 15 
 16     /**
 17      * The maximum capacity, used if a higher value is implicitly specified
 18      * by either of the constructors with arguments.
 19      * MUST be a power of two <= 1<<30.
 20      */
 21     static final int MAXIMUM_CAPACITY = 1 << 30;
 22 
 23     /**
 24      * Initializes or doubles table size.  If null, allocates in
 25      * accord with initial capacity target held in field threshold.
 26      * Otherwise, because we are using power-of-two expansion, the
 27      * elements from each bin must either stay at same index, or move
 28      * with a power of two offset in the new table.
 29      *
 30      * @return the table
 31      */
 32     final Node<K,V>[] resize() {
 33         Node<K,V>[] oldTab = table;
 34         int oldCap = (oldTab == null) ? 0 : oldTab.length;
 35         int oldThr = threshold;
 36         int newCap, newThr = 0;
 37         // 数组已初始化
 38         if (oldCap > 0) {
 39             // 当前数组容量超出最大限度
 40             if (oldCap >= MAXIMUM_CAPACITY) {
 41                 threshold = Integer.MAX_VALUE;
 42                 return oldTab;
 43             } // 容量扩大一倍，扩容阈值也扩大一倍
 44             else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
 45                      oldCap >= DEFAULT_INITIAL_CAPACITY)
 46                 newThr = oldThr << 1; // double threshold
 47         }
 48         else if (oldThr > 0) // initial capacity was placed in threshold
 49             newCap = oldThr;
 50         else {               
 51             // zero initial threshold signifies using defaults
 52             // 数组未初始化时执行
 53             newCap = DEFAULT_INITIAL_CAPACITY; // 容量默认为 16
 54             // 扩容阈值为 12
 55             newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
 56         }
 57         if (newThr == 0) {
 58             float ft = (float)newCap * loadFactor;
 59             newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
 60                       (int)ft : Integer.MAX_VALUE);
 61         }
 62         // 设置扩容阈值 
 63         threshold = newThr;
 64         @SuppressWarnings({"rawtypes","unchecked"})
 65         // 初始化 Node 数组
 66             Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
 67         table = newTab;
 68         // 扩容
 69         if (oldTab != null) {
 70             for (int j = 0; j < oldCap; ++j) {
 71                 Node<K,V> e;
 72                 // 当前数组下标不为空
 73                 if ((e = oldTab[j]) != null) {
 74                     oldTab[j] = null;
 75                     // 当前下标只有一个元素
 76                     if (e.next == null)
 77                         // 将该元素放置到新数组的新下标位置
 78                         newTab[e.hash & (newCap - 1)] = e;
 79                     // 当前下标下是红黑树
 80                     else if (e instanceof TreeNode)
 81                         // 将红黑树下的元素拆分到新数组
 82                         ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
 83                     // 当前下标下是链表
 84                     else { // preserve order
 85                         Node<K,V> loHead = null, loTail = null;
 86                         Node<K,V> hiHead = null, hiTail = null;
 87                         Node<K,V> next;
 88                         // 遍历链表，将元素迁移到新的数组
 89                         do {
 90                             next = e.next;
 91                             // 判断迁移后元素在新数组的下标位置，此处为 HashMap 算法巧妙的地方
 92                             if ((e.hash & oldCap) == 0) {
 93                                 if (loTail == null)
 94                                     loHead = e;
 95                                 else
 96                                     loTail.next = e;
 97                                 loTail = e;
 98                             }
 99                             else {
100                                 if (hiTail == null)
101                                     hiHead = e;
102                                 else
103                                     hiTail.next = e;
104                                 hiTail = e;
105                             }
106                         } while ((e = next) != null);
107                         if (loTail != null) {
108                             loTail.next = null;
109                             newTab[j] = loHead; // 下标位置与旧数组相同
110                         }
111                         if (hiTail != null) {
112                             hiTail.next = null;
113                             newTab[j + oldCap] = hiHead; // 下标位置为旧下标 + 扩容长度
114                         }
115                     }
116                 }
117             }
118         }
119         return newTab;
120     }
121 
122 }

源码分析：

HashMap 默认初始化容量为 16，扩容阈值为 12，见代码 53 行；

HashMao 扩容后的容量是之前的一倍，见代码 44 行；

HashMap 扩容后需要将元素迁移到新的数组，首先会遍历旧数组（见代码 70 行），下标元素不为空时有三种情况：

- 只有一个元素；
- 红黑树；
- 链表

具体见代码 76 - 84 行　

HashMap 的源码分析暂时到这里结束，如果有哪些地方不清楚的话请在下方留言，本人后续会继续更新。