Java source code parsing | HashMap Past and Present

HashMap Past and Present

Java8 on the basis of Java7, made some improvements and optimizations.
And the underlying data structure implemented method, the HashMap a rewrite almost
all the set are added functional methods, for example forEach, also added a number of useful functions.

Past --Java 1.7

Underlying data structure

Array list +

In the HashMap Java1.7 array + chain as storage structure
array composed of a container similar to a bucket, a linked list is used to resolve the conflict, when a conflict occurs, to find the position data should be stored in the current bucket (array subscript) insert a new list of nodes in the current bucket.

As shown below:
Alt text

Stored in the node list (key, value) of the key
Alt text

Expansion and initialization

Initialization: initially, the HashMap array size (number of barrels) The default is 16 , and the array size must be a power of 2
look as shown in FIG source Notes
Alt text
Alt text

resize method expansion
when expansion?
Bucket node list element exceeds threshole variable expansion factor = 16 * 0.75 = 12 starts when the expansion
Alt text
is defined as the maximum expansion of the size of the Integer
Alt text

  • Expansion doubled:
    Alt text

  • How expansion: opening up new array (barrel), a method using copy transfer old data array into a new array, a hash value calculated overwrite some elements (the rehash)
    Alt text
    Alt text

  • transfer function, the barrel of the old table moved to a new barrel
    to traverse a linked list per bucket, re-rehash, indexFor get the index of the new table, put a new table
    Alt text

hash algorithm

  • Why the size of the array must be a power of 2, then
    we find the array index hash value of key discovery method in the array is set to a power of 2, for when the modulo turn into a bit operation, just get array subscript
    Alt text

For chestnut: For example, assume that an array of length 5 th power of 2, i.e. a length of 32, we take the hash value (32) and the length of the array for the Key & AND operation can be obtained within a length of the array subscript , the subscript is the current key position in the table should be in the table.
See the figure demonstrates it:
Alt text

Therefore, the array size must be defined as a power of 2 is reason to calculate future hash algorithm key index in the array subscripts .

Get the key hashcode algorithm is more complex in 1.7, is not excessive stated.

put method

  • Comparative equals method requires the use of key, so the need for custom class method override equals
  • Therefore we recommended that String class equals method has been rewritten as a key to key.
    Alt text
    Alt text

Remaining problems: security, deadlock

1. hashmap1.7 thread-safe
Alt text
concurrent concurrent, when expansion is necessary to use the data transfer function to copy the list, pit, dead circular list prone to deadlock

View Reference Link

2. hash collision safety issues
hash algorithm collision Java1.7 appears to be caused by malicious requests DOS
follows the same hash value
Alt text

Solution: Put another hash calculation method
Alt text
Alt text

This life --Java 1.8

Underlying data structure

  • HashMap underlying data structure is: an array list + + red-black tree .
  • When the chain length is not less than 8, the list will be converted into black tree;
  • When the red-black tree size is less than equal to 6, the chain will be converted into red-black tree;
    the entire data structure is as follows:
    Alt text

Expansion and initialization

Common attributes:

//初始容量为 16
 static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;

 //最大容量
 static final int MAXIMUM_CAPACITY = 1 << 30;

 //负载因子默认值
 static final float DEFAULT_LOAD_FACTOR = 0.75f;
 
 //桶上的链表长度大于等于8时,链表转化成红黑树
 static final int TREEIFY_THRESHOLD = 8;

 //桶上的红黑树大小小于等于6时,红黑树转化成链表
 static final int UNTREEIFY_THRESHOLD = 6;

 //当数组容量大于 64 时,链表才会转化成红黑树
 static final int MIN_TREEIFY_CAPACITY = 64;

 //记录迭代过程中 HashMap 结构是否发生变化,如果有变化,迭代时会 fail-fast
 transient int modCount;

 //HashMap 的实际大小,可能不准(因为当你拿到这个值的时候,可能又发生了变化)
 transient int size;

 //存放数据的数组
 transient Node<K,V>[] table;

 // 扩容的门槛,有两种情况
 // 如果初始化时,给定数组大小的话,通过 tableSizeFor 方法计算,数组大小永远接近于 2 的幂次方,比如你给定初始化大小 19,实际上初始化大小为 32,为 2 的 5 次方。
 // 如果是通过 resize 方法进行扩容,大小 = 数组容量 * 0.75
 int threshold;

 //链表的节点
 static class Node<K,V> implements Map.Entry<K,V> {
 
 //红黑树的节点
 static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V>
  • 16 can be seen as the initial capacity, a maximum capacity of 30 th power of 2
  • When the array capacity is greater than 64, and the list of nodes> 8, the list will be converted to black tree =
    converted into a red-black tree probability is very small (one thousandth), because a suitable hash calculation of multiple collisions do not occur rarely
    in design considering the list of nodes> = 8 this value, we refer to the Poisson probability distribution function, concluded by the Poisson distribution, the probability of a hit list of various length:
* 0:    0.60653066
* 1:    0.30326533
* 2:    0.07581633
* 3:    0.01263606
* 4:    0.00157952
* 5:    0.00015795
* 6:    0.00001316
* 7:    0.00000094
* 8:    0.00000006

Mean, when is the time length of the list of 8, the probability is 0.00000006, less than one in ten million, so that under normal circumstances, the length of the list can not reach 8, and when once it reaches 8, certainly hash algorithm out of the question, so in this case, in order to allow HashMap remain at high query performance, so let the list be converted into red-black tree, we normally write code, use HashMap, almost hit list will not be converted into red and black the case of the tree.

  • Expansion
    Expansion There are two cases:
  1. If the initialization, if given to the size of the array, calculated by tableSizeFor method, the capacity size of the array will look similar, the array size is always a power of 2, for example, you initialize a given size of 19, in fact, the initialization size is 32, that is, 5 th power of 2.

  2. If the expansion is carried out by a resize method, when the size of> 0.75 for the array capacity resize *
    Alt text
    Alt text

  • After the expansion of the original copy of the table, like the transfer function of java1.7, java1.8 in order to maintain replication thread is still unsafe
    Alt text
  • High and low when the list of expansion do not understand.
    Alt text

  • Resize low efficiency, the need to copy, so it is best to specify a certain capacity during initialization to avoid performance problems caused by frequent expansion.

put insertion method

  • HashMap new node following steps:

Alt text
Alt text

  • Put part of the code is as follows
// 入参 hash:通过 hash 算法计算出来的值。
// 入参 onlyIfAbsent:false 表示即使 key 已经存在了,仍然会用新值覆盖原来的值,默认为 false
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    // n 表示数组的长度,i 为数组索引下标,p 为 i 下标位置的 Node 值
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    //如果数组为空,使用 resize 方法初始化
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // 如果当前索引位置是空的,直接生成新的节点在当前索引位置上
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    // 如果当前索引位置有值的处理方法,即我们常说的如何解决 hash 冲突
    else {
        // e 当前节点的临时变量
        Node<K,V> e; K k;
        // 如果 key 的 hash 和值都相等,直接把当前下标位置的 Node 值赋值给临时变量
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // 如果是红黑树,使用红黑树的方式新增
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // 是个链表,把新节点放到链表的尾端
        else {
            // 自旋
            for (int binCount = 0; ; ++binCount) {
                // e = p.next 表示从头开始,遍历链表
                // p.next == null 表明 p 是链表的尾节点
                if ((e = p.next) == null) {
                    // 把新节点放到链表的尾部 
                    p.next = newNode(hash, key, value, null);
                    // 当链表的长度大于等于 8 时,链表转红黑树
                    if (binCount >= TREEIFY_THRESHOLD - 1)
                        treeifyBin(tab, hash);//树化
                    break;
                }
                // 链表遍历过程中,发现有元素和新增的元素相等,结束循环
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                //更改循环的当前元素,使 p 在遍历过程中,一直往后移动。
                p = e;
            }
        }
        // 说明新节点的新增位置已经找到了
        if (e != null) {
            V oldValue = e.value;
            // 当 onlyIfAbsent 为 false 时,才会覆盖值 
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            // 返回老值
            return oldValue;
        }
    }
    // 记录 HashMap 的数据结构发生了变化
    ++modCount;
    //如果 HashMap 的实际大小大于扩容的门槛,开始扩容
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

If the array with the key, but do not want to cover the value, you can choose putlfAbsent method, which has a built-in variables onlylfAbsent, built-in is true, it will not cover, put the method we usually use the built-in onlylfAbsent is false, is to allow coverage.

  • New node list: adding new nodes to the tail of the list on the line.
  • Add red-black tree node the steps of
    Alt text
//入参 h:key 的hash值
final TreeNode<K,V> putTreeVal(HashMap<K,V> map, Node<K,V>[] tab,
                               int h, K k, V v) {
    Class<?> kc = null;
    boolean searched = false;
    //找到根节点
    TreeNode<K,V> root = (parent != null) ? root() : this;
    //自旋
    for (TreeNode<K,V> p = root;;) {
        int dir, ph; K pk;
        // p hash 值大于 h,说明 p 在 h 的右边
        if ((ph = p.hash) > h)
            dir = -1;
        // p hash 值小于 h,说明 p 在 h 的左边
        else if (ph < h)
            dir = 1;
        //要放进去key在当前树中已经存在了(equals来判断)
        else if ((pk = p.key) == k || (k != null && k.equals(pk)))
            return p;
        //自己实现的Comparable的话,不能用hashcode比较了,需要用compareTo
        else if ((kc == null &&
                  //得到key的Class类型,如果key没有实现Comparable就是null
                  (kc = comparableClassFor(k)) == null) ||
                  //当前节点pk和入参k不等
                 (dir = compareComparables(kc, k, pk)) == 0) {
            if (!searched) {
                TreeNode<K,V> q, ch;
                searched = true;
                if (((ch = p.left) != null &&
                     (q = ch.find(h, k, kc)) != null) ||
                    ((ch = p.right) != null &&
                     (q = ch.find(h, k, kc)) != null))
                    return q;
            }
            dir = tieBreakOrder(k, pk);
        }

        TreeNode<K,V> xp = p;
        //找到和当前hashcode值相近的节点(当前节点的左右子节点其中一个为空即可)
        if ((p = (dir <= 0) ? p.left : p.right) == null) {
            Node<K,V> xpn = xp.next;
            //生成新的节点
            TreeNode<K,V> x = map.newTreeNode(h, k, v, xpn);
            //把新节点放在当前子节点为空的位置上
            if (dir <= 0)
                xp.left = x;
            else
                xp.right = x;
            //当前节点和新节点建立父子,前后关系
            xp.next = x;
            x.parent = x.prev = xp;
            if (xpn != null)
                ((TreeNode<K,V>)xpn).prev = x;
            //balanceInsertion 对红黑树进行着色或旋转,以达到更多的查找效率,着色或旋转的几种场景如下
            //着色:新节点总是为红色;如果新节点的父亲是黑色,则不需要重新着色;如果父亲是红色,那么必须通过重新着色或者旋转的方法,再次达到红黑树的5个约束条件
            //旋转: 父亲是红色,叔叔是黑色时,进行旋转
            //如果当前节点是父亲的右节点,则进行左旋
            //如果当前节点是父亲的左节点,则进行右旋
          
            //moveRootToFront 方法是把算出来的root放到根节点上
            moveRootToFront(tab, balanceInsertion(root, x));
            return null;
        }
    }
}
  • About the red-black tree also need to add knowledge (accounting for pit)

get lookup method

List query time complexity is O (n), the red-black tree query complexity is O (log (n). In the few linked list data when using relatively fast traversal list, the list only when more data time, will be converted into red-black trees, red-black tree, but the space required is twice the list, taking into account the conversion loss of time and space, so we need to define a boundary value conversion, the node list> = 8 only conduct of the tree.

  • Find HashMap steps:
    Alt text

  • Find key list needs to be rewritten is custom class equals method (LLI node to compare values ​​are equal)
// 采用自旋方式从链表中查找 key,e 初始为为链表的头节点
do {
    // 如果当前节点 hash 等于 key 的 hash,并且 equals 相等,当前节点就是我们要找的节点
    // 当 hash 冲突时,同一个 hash 值上是一个链表的时候,我们是通过 equals 方法来比较 key 是否相等的
    if (e.hash == hash &&
        ((k = e.key) == key || (key != null && key.equals(k))))
        return e;
    // 否则,把当前节点的下一个节点拿出来继续寻找
} while ((e = e.next) != null);
  • Red-black tree lookup key is a custom class needs to be rewritten compator method (red-black tree to determine the left or right child nodes go away)
    Alt text

hash algorithm to streamline

Use XOR hash, to take exclusive or low high 16 16

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
key 在数组中的位置公式:tab[(n - 1) & hash]
  • h ^ (h >>> 16), the benefits of doing so is to make the most of the scene, the calculated hash value of more dispersed.
    After the hash value is calculated, when the key index to calculate the current index position in the array, the array may be employed modulo length, but for the calculation of modulo operations processor is relatively slow, there is a mathematical formula, if b when a power of 2, a% b = a & ( b-1), so that the position index calculated here we can replace: (n-1) & hash .

Alt text

As the tree is likely to degenerate into status linked list, the red-black tree is a balanced binary tree, the height is adjusted by the spin
Alt text

New Methods

  • getOrDefault: If the value of the corresponding key does not exist, the default value of the desired return defaultValue
public V getOrDefault(Object key, V defaultValue) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? defaultValue : e.value;
}
  • putlfAbsent (K key, V value): If there is a map of the key, then the value will not be covered if the key does not exist, new success.
  • compute: After the value of the key and allows us to calculate the value, and then put into map, key value does not exist in order to prevent errors caused by unknown,
  • computelfPresent method: that only when there is a key, it performs calculations
 public void compute(){
    HashMap<Integer,Integer> map = Maps.newHashMap();
    map.put(10,10);
    log.info("compute 之前值为:{}",map.get(10));
    map.compute(10,(key,value) -> key * value);
    log.info("compute 之后值为:{}",map.get(10));
    // 还原测试值
    map.put(10,10);

    // 如果为 11 的 key 不存在的话,需要注意 value 为空的情况,下面这行代码就会报空指针
    //  map.compute(11,(key,value) -> key * value);
    
    // 为了防止 key 不存在时导致的未知异常,我们一般有两种办法
    // 1:自己判断空指针
    map.compute(11,(key,value) -> null == value ? null : key * value);
    // 2:computeIfPresent 方法里面判断
    map.computeIfPresent(11,(key,value) -> key * value);
    log.info("computeIfPresent 之后值为:{}",map.get(11));
  }
结果是:
compute 之前值为:10
compute 之后值为:100
computeIfPresent 之后值为:null(这个结果中,可以看出,使用 computeIfPresent 避免了空指针)

From Past to Present athletic, --default

  • Java8 on the collection added a lot of ways, why Java7 in the implementation of these interfaces do not need to enforce these methods?
    Mainly because these new methods modified the default keyword, once the default method on the interface modification, we need to write in the default implementation of the interface method, and the subclass without being forced to implement these methods, so no need to implement those interfaces Java7 perception.

Summary: HashMap of Sansei III

Java8 on the basis of Java7, made some improvements and optimization, to connect generations by default keyword. HashMap almost rewrite the set of all collections have added a functional approach, such as forEach, but also added a lot of useful functions.

Last week's source code and data made after reading this article, this article hashMap of knowledge, if wrong, please correct me, thanks!

Guess you like

Origin www.cnblogs.com/fisherss/p/11701244.html