In-depth analysis of the principle JDK8 HashMap, implementation and optimization

HashMap can be said to process the most frequently used key mapping data structure, it does not guarantee the order of insertion, allowing insertion of the key and null values. In this paper, the JDK8 source, in-depth analysis of the principle of HashMap, implementation and optimization. Starting in the micro-channel public number epiphany source .

1. Basic Structure

Based HashMap hash table implemented using fastener method handle collisions, in JDK8, when converted to 8 chain length is greater than the red-black tree storage, the basic structure is as follows:

The basic structure of HashMap

HashMap a Node <K, V> [] table field, i.e., a hash bucket array, the array elements are Node objects, the structure is defined as follows:

static class Node<K,V> implements Map.Entry<K,V> {
  final int hash; // 用于计算数组索引
  final K key;
  V value;
  Node<K,V> next; // 后继节点,下一个 Node

  Node(int hash, K key, V value, Node<K,V> next) { ... }
  ...
}
复制代码

Hash bucket array is initialized on first use, the default size is 16, and resize as needed, and the length is always a power of two. If the initial capacity is not a power of 2 set constructor, then use the following method to return a greater than and closest to the value of its power of two of:

static final int tableSizeFor(int cap) {
  int n = cap - 1;
  n |= n >>> 1;
  n |= n >>> 2;
  n |= n >>> 4;
  n |= n >>> 8;
  n |= n >>> 16;
  return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
复制代码

The principle is the highest of all bits 1 bit to the right of the whole set to 1, then add 1, the highest bit into the 1 bit to the right of all become 0 to arrive at the value of a power of 2. Is used in the JDK7 Integer.highestOneBit (int i) method, when it was last calculated using n - (n >>> 1) returns a smaller than and closest to the parameters of the power of two.

Internal HashMap other fields:

// 键值对的数量
transient int size;
// 记录结构修改次数,用于迭代时的快速失败
transient int modCount;
// 负载因子,默认 0.75f
final float loadFactor;
// 扩容的下一个容量值,也就是键值对个数的最大值,它等于(capacity * loadFactor)
int threshold;
复制代码

The main parameters affecting the performance is HashMap: initial capacity and load factor . When the number of elements in the hash table exceeds the product of the load factor and the current capacity expansion will expand the capacity of the original double , and the key to re-hash.

  • The initial capacity is too small a number of times to trigger expansion and rehash, so pre-allocate a large enough capacity more effectively
  • Load factor default is 0.75f, it is a good balance between time and space costs, generally without modification, a higher value will reduce the space overhead, but increases the cost of looking for

No matter how reasonable hashing algorithm, the situation will inevitably long chain, thereby affecting the performance of the HashMap, therefore, JDK8 when the chain length is greater than 8, which is converted to red-black trees, red-black tree to take advantage of the fast CRUD features.

2. hash function

The integer hash of the most commonly used method is except I stay . In order to uniformly hash value hash key, usually the size of the array taken prime number (that is, the initial size of the HashTable 11), because the small number of prime factors I equal to the number of small probability, the likelihood of collision is small.

HashMap capacity is always a power of two, which is a composite number , the reason for this design, in order to place into operation the modulo operation and improve performance. This equation h % length = h & (length-1)reason established as follows:

2^1 = 10          2^1 -1 = 01 
2^2 = 100         2^2 -1 = 011 
2^3 = 1000        2^3 -1 = 0111
2^n = 1(n个零)     2^n -1 = 0(n个1) 
复制代码

Right is 2 ^ n binary features, characteristics left 2 ^ n-1 can be found when the time length = 2 ^ n, h & (length-1) results precisely located between 0 to length-1, quite in a modulo operation.

After turning operation position, length-1 is equivalent to a low mask , when the press position, the high position it will original hash value 0, which results in a change in the hash mask only a small area obviously increases the chances of conflict. In order to reduce conflict, HashMap hash algorithm in the design, use of high and low bit XOR , disguised so that high bond also involved in the operation, code is as follows:

static final int hash(Object key) { // JDK8
  int h;
  // h = key.hashCode()  1. 取hashCode值
  // h ^ (h >>> 16)      2. 高16位与低16位异或,变相保留高位的比特位
  return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
// JDK7 的源码,JDK8 没有这个方法,但原理一样
static int indexFor(int h, int length) {
   return h & (length-1); // 3. 取模运算
}
复制代码

XOR high displacement, high and low can ensure efficient use of the key information, but also to reduce system overhead, this design is a trade-off between speed, efficiency and quality.

3. put operation

put operating mainly do the following things:

  1. Hash table array bucket is empty, the method is initialized by a resize ()
  2. Key to be inserted already exists, overwrite value
  3. If not, the key inserted in the corresponding list or a red-black tree
  4. Determining whether the insertion turn red-black tree list
  5. Capacity is needed to determine whether

The core code is as follows:

public V put(K key, V value) {
  // 将 key 的 hashCode 散列
  return putVal(hash(key), key, value, false, true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
  Node<K,V>[] tab; Node<K,V> p; int n, i;
  // 1. table 为 null,初始化哈希桶数组
  if ((tab = table) == null || (n = tab.length) == 0)
    n = (tab = resize()).length;
  // 2. 计算对应的数组下标 (n - 1) & hash
  if ((p = tab[i = (n - 1) & hash]) == null)
    // 3. 这个槽还没有插入过数据,直接插入
    tab[i] = newNode(hash, key, value, null);
  else {
    Node<K,V> e; K k;
    // 4. 节点 key 存在,直接覆盖 value
    if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
      e = p;
    // 5. 该链转成了红黑树
    else if (p instanceof TreeNode) // 在树中插入
      e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
    // 6. 该链是链表
    else {
      for (int binCount = 0; ; ++binCount) {
        // 遍历找到尾节点插入
        if ((e = p.next) == null) {
          p.next = newNode(hash, key, value, null);
          // 链表长度大于 8 转为红黑树
          if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
            treeifyBin(tab, hash);
          break;
        }
        // 遍历的过程中,遇到相同 key 则覆盖 value
        if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
          break;
        p = e;
      }
    }
    if (e != null) { // existing mapping for key
      V oldValue = e.value;
      if (!onlyIfAbsent || oldValue == null)
        e.value = value;
      afterNodeAccess(e);
      return oldValue;
    }
  }
  ++modCount;
  // 7. 超过最大容量,扩容
  if (++size > threshold)
    resize();
  afterNodeInsertion(evict);
  return null;
}
复制代码

JDK8 used when inserting the end of the list is the insertion method, the order is inserted, while the head JDK7 using interpolation, reverse insertion.

6. expansion mechanism

By default, the initial capacity is 16, the load factor is 0.75f, threshold 12, i.e., the insertion of the key 12 will be expansion.

When the expansion, will be expanded to twice the original, because using power extension 2 , then the position of the element either remain unchanged, or 2 to the power shift in the original position.

resize-1

The figure can be seen, amplified by 2, the left one corresponds to n, then n-1 at a high level on more of a 1, this time with the original hash value calculation, the more involved one, this bit is either 0 or 1:

  • 0 if the index unchanged
  • Then index 1 becomes "the original index + oldCap"

So how to determine this bit is 0 or 1 it? If the "original hash value & oldCap" is 0, it indicates that bit is 0. Expansion code is as follows:

final Node<K,V>[] resize() {
  Node<K,V>[] oldTab = table;
  int oldCap = (oldTab == null) ? 0 : oldTab.length;
  int oldThr = threshold;
  int newCap, newThr = 0;
  if (oldCap > 0) {
    // 超过最大值,不在扩容
    if (oldCap >= MAXIMUM_CAPACITY) {
      threshold = Integer.MAX_VALUE;
      return oldTab;
    }// 否则扩大为原来的 2 倍
    else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
           oldCap >= DEFAULT_INITIAL_CAPACITY)
      newThr = oldThr << 1; // double threshold
  }
  else if (oldThr > 0) // initial capacity was placed in threshold
    // 初始化时,threshold 暂时保存 initialCapacity 参数的值
    newCap = oldThr;
  else {               // zero initial threshold signifies using defaults
    newCap = DEFAULT_INITIAL_CAPACITY;
    newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
  }
  // 计算新的 resize 上限
  if (newThr == 0) {
      float ft = (float)newCap * loadFactor;
      newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                (int)ft : Integer.MAX_VALUE);
  }
  threshold = newThr;
  @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
  table = newTab;
  // 将旧的键值对移动到新的哈希桶数组中
  if (oldTab != null) {
    for (int j = 0; j < oldCap; ++j) {
      Node<K,V> e;
      if ((e = oldTab[j]) != null) {
        oldTab[j] = null;
        if (e.next == null) // 无链条
          newTab[e.hash & (newCap - 1)] = e;
        else if (e instanceof TreeNode)
          // 拆红黑树,先拆成两个子链表,再分别按需转成红黑树
          ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
        else { // preserve order
          // 拆链表,拆成两个子链表并保持原有顺序
          Node<K,V> loHead = null, loTail = null;
          Node<K,V> hiHead = null, hiTail = null;
          Node<K,V> next;
          do {
            next = e.next;
            // 原位置不变的子链表
            if ((e.hash & oldCap) == 0) {
              if (loTail == null)
                loHead = e;
              else
                loTail.next = e;
              loTail = e;
            }
            // 原位置偏移 oldCap 的子链表
            else {
              if (hiTail == null)
                hiHead = e;
              else
                hiTail.next = e;
              hiTail = e;
            }
          } while ((e = next) != null);
          // 放到新的哈希桶中
          if (loTail != null) {
            loTail.next = null;
            newTab[j] = loHead;
          }
          if (hiTail != null) {
            hiTail.next = null;
            newTab[j + oldCap] = hiHead;
          }
        }
      }
    }
  }
  return newTab;
}
复制代码

When recalculating the position of the list elements is only possible to obtain two sub-lists : the same elements of the list and the index have the same offset elements of the list. During the construction of the sub-list, a head node and tail node to ensure orderly after split:

resize-2

See TreeNode.split () method showed that like the red-black tree split and logical linked list, but after the split is completed, the following processing will be done according to the length of the sub-list:

  • Common chain length is less than 6, it does not include a return of TreeNode
  • Otherwise, the sub-lists into a red-black tree

Red-black tree able to split logically linked list, the linked list is because when the red-black tree turn, retained the original list of references chain, which would also facilitate the traversal operations.

7. list turn black tree

Transfer list red-black tree is mainly to do the following things:

  1. Tank capacity judging whether the minimum requirements of the tree, or for expansion
  2. The original list into a doubly linked list composed by the TreeNode
  3. The new list into a red-black tree

code show as below:

final void treeifyBin(Node<K,V>[] tab, int hash) {
  int n, index; Node<K,V> e;
  // 如果哈希桶容量小于树化的最小容量,优先进行扩容
  if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
    resize();
  else if ((e = tab[index = (n - 1) & hash]) != null) {
    TreeNode<K,V> hd = null, tl = null;
    do { // 将普通节点转为树形节点
      TreeNode<K,V> p = replacementTreeNode(e, null);
      if (tl == null)
        hd = p;
      else {
        p.prev = tl;
        tl.next = p;
      }
      tl = p;
      // 把原来的单链表转成了双向链表
    } while ((e = e.next) != null);
    if ((tab[index] = hd) != null)
      hd.treeify(tab); // 将链表转为红黑树
  }
}
TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) {
  return new TreeNode<>(p.hash, p.key, p.value, next);
}
复制代码

HashMap should not be considered in the design of post-introduce red-black tree, it does not provide key comparator or require key implement the Comparable interface. To compare two key size, HashMap process the following steps:

  1. If the hash value of two key ranges, the comparison hash value size
  2. If they are equal, if the key to achieve Comparable interface, using the comparative method compareTo
  3. If the result is equal, a method using a self-defined tieBreakOrder comparison logic is as follows
static int tieBreakOrder(Object a, Object b) {
  int d;
  if (a == null || b == null || // 比较 className 的大小
    (d = a.getClass().getName().compareTo(b.getClass().getName())) == 0)
    // 比较由本地方法生成的 hash 值大小,仍然有可能冲突,几率太小,此时认为是小于的结果
    d = (System.identityHashCode(a) <= System.identityHashCode(b) ? -1 : 1);
  return d;
}
复制代码

8. Summary

JDK8 HashMap in the code is quite complex, optimization mainly in the following three points:

  • Optimization hash algorithm is only one shift operation
  • Introduction of red-black trees, the conflict in the more serious cases, the time from the get operation complexity O (n) is reduced to the O (logn)
  • When expansion using power characteristics of a binary value of 2, only eliminates the need to recalculate the hash time, again before the conflict to another location node hashes

In addition, HashMap is not thread safe , and between threads competitive conditions mainly when a conflict or expansion, the list of chain scission and continuous operation. Expansion means that memory copy, which is a very time consuming operation performance, so pre-allocate a large enough initial capacity, reducing the number of expansion that allows HashMap for better performance.

Search micro-channel public number " epiphany source code " for more source code analysis and build wheels.

Guess you like

Origin juejin.im/post/5d7ec6d4f265da03b76b50ff