HashMap: storage structure, expansion mechanism and thread unsafe solve

1. What is HashMap

1.1 What is the Map ?

1.key-value格式存储集合类  
2.key必须唯一,无重复值  
3.map是与collection同一个等级的接口
复制代码

1.2 What is a hash ?

1.把任意长度的输入,通过(hashCode()方法),变换成固定长度的哈希值(hashCode,这种函数就叫做**哈希函数**,而**计算哈希值的过程就叫做哈希**。  
2.哈希的主要应用是哈希表和分布式缓存。  
3.哈希函数是哈希算法的一种实现。  
复制代码

Reference: blog.csdn.net/qq_36711757...

1.3 What is a HashMap ?

1.基于hash表的Map接口的实现;   
2.HashMap既有Map的键值对特点,又有哈希表特点;
复制代码

2.HashMap nature

用于存储  
复制代码

example:

 Map map = new HashMap<>();
        map.put("yww",222);
        System.out.println(map);
复制代码

put function:

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
复制代码

putVal function:

    /**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
            // tab为空则创建
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
                // 计算index,并对null做处理
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
                    // 节点key存在,直接覆盖原来的value
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
                         // 判断该链为红黑树   
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
                       // 判断该链为链表
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                                            //链表长度大于8转换为红黑树进行处理
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                                    // key已经存在直接覆盖原来的value
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        // modeCount字段主要用来记录HashMap内部结构发生变化的次数,主要用于迭代的快速失败
        ++modCount;
        // size是指HashMap中实际存在的键值对数量;threshold是指允许的最大元素数目,超过这个数量,需要扩容(resize)
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

复制代码

3.HashMap storage structure

  • Array: continuously stored; lookup O (1); indels O (n);
  • List: a storage disorder; lookup O (n); indels O (1);
  • It is called HashMap HashMap HashMap is because the put (String, Object) when the JVM will deposit objects in a hash (all objects are inherited Object, and hashcode method from the Object class), so as to acquire the hash value of the object, then the JVM according to this hash value to determine the storage position of the element.
  • Key event of two stores to the same location, the Hash collision occurs (collision), an array of Java + linked list used on the role .Java using chain address method like elements (constituting a hash value list, the list head pointer to the Node [] of the index), avoids the problem of conflict Hash (refer to FIG HashMap above).
  • After Hash collision occurs, but this is not a plurality Entry Entry slots stored, this time using the list to the Entry (see HashMap data structure). JVM in the order to traverse each Entry, until finds the corresponding Entry date (list queries)

Before 3.1 jdk1.8: + array list

+ Array list: Method link address

After 3.2 jdk1.8: + black tree list data +

+ + Array list red-black tree

  • When KV JVM storage HashMap, just to determine each of the storage slots Entry (index Node [] in) by Key. Value and mounted to a corresponding slot to form a linked list (if the length is greater than 1.8 after 8 into red-black tree).
  • Wherein putVal jvm function can be seen in an array using a stored list + + red-black tree.
  • Hash collision occurs, the first interpolation method not using embodiments, but directly into the tail of the linked list, see putVal function.

Reference: blog.csdn.net/maohoo/arti...

The difference is that :

  • 1.7, entry is the Node <K, V> :

  • 1.8 in the TreeNode <K, V> :

  • Both inherited from the Entry :

TreeNode and Entry is a subclass of Node, it may be said Node list structure, it may be red-black tree structure. If you insert the key hashcode is the same, then the key will be targeted to the same frame house Node array. If the same key of the grid is not more than eight, using a linked list structure to store. If you have more than eight, then calls treeifyBin function to convert the list is red-black tree.

Why add 3.3 storage structure black tree?

  • What is the red-black tree : red-black tree is actually a self-balancing binary search tree

  • How to maintain self-balancing red-black tree : red-black tree black tree to maintain the rule of the "color" and "spin", is to make the black color becomes red, red into the black, the rotation is divided into "L turn "and" right rotation. "

  • Advantages : the amount of data is large, to find efficient than a linked list, lifted from O (n) to O (logn)

  • Why should only node with more than eight red-black tree :
    red-black tree needs to be left, right-handed operation, and does not require a single linked list, the linked list is a single red-black tree structure comparison.

    如果元素小于8个,查询成本高,新增成本低
    如果元素大于8个,查询成本低,新增成本高
    复制代码

4. The expansion mechanism

4.1 satisfy the conditions

1, when storing a new value of the current number of elements must have a threshold value greater than or equal

2, when the current value storage to store the new value of the array index position data hash collision occurs (the currently computed hash value conversion key already exists out
putVal the expansion is determined :

 if (++size > threshold)
        resize();
复制代码

Since the above two conditions, the presence of such things as

(1) is stored in the hashmap value of time (the default size is 16, the load factor of 0.75, the threshold 12), may reach the last 16 values ​​is full time, and then stored in the first 17 values ​​of the expansion phenomenon will occur, because the former 16 values ​​each occupy a position in the bottom of the array, hash collision does not occur.

(2) Of course it is also possible to store more value (more than 16 super value, you can save up to 26 value) have not expansion. Principle: Before a collision hash values ​​of all 11, to keep the same position of the array (in this case the number of elements less than the threshold value 12, not expansion), all 15 values ​​back into the dispersion to all remaining array positions 15 (in this case the number of elements not less than the threshold value, but each element is not stored in hash collision occurs, it will not expansion), the first 11 + 15 = 26, so it satisfies the above values ​​into the 27th time two conditions, when will this expansion phenomenon.

4.2 source code parsing

 final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table; //当前所有元素所在的数组,称为老的元素数组
    int oldCap = (oldTab == null) ? 0 : oldTab.length; //老的元素数组长度
    int oldThr = threshold; // 老的扩容阀值设置
    int newCap, newThr = 0; // 新数组的容量,新数组的扩容阀值都初始化为0
    if (oldCap > 0) {   // 如果老数组长度大于0,说明已经存在元素
        // PS1
        if (oldCap >= MAXIMUM_CAPACITY) { // 如果数组元素个数大于等于限定的最大容量(2的30次方)
            // 扩容阀值设置为int最大值(2的31次方 -1 ),因为oldCap再乘2就溢出了。
            threshold = Integer.MAX_VALUE;  
            return oldTab;  // 返回老的元素数组
        }

       /*
        * 如果数组元素个数在正常范围内,那么新的数组容量为老的数组容量的2倍(左移1位相当于乘以2)
        * 如果扩容之后的新容量小于最大容量  并且  老的数组容量大于等于默认初始化容量(16),那么新数组的扩容阀值
            要么已经经历过了至少一次扩容)
        */
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }

    // PS2
    // 运行到这个else if  说明老数组没有任何元素
    // 如果老数组的扩容阀值大于0,那么设置新数组的容量为该阀值
    // 这一步也就意味着构造该map的时候,指定了初始化容量。
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        // 能运行到这里的话,说明是调用无参构造函数创建的该map,并且第一次添加元素
        newCap = DEFAULT_INITIAL_CAPACITY;  // 设置新数组容量 为 16
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); // 设置新数组扩容阀值为 16*0.75 = 12。0.75为负载因子(当元素个数达到容量了4分之3,那么扩容)
    }

    // 如果扩容阀值为0 (PS2的情况)
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);  // 参见:PS2
    }
    threshold = newThr; // 设置map的扩容阀值为 新的阀值
    @SuppressWarnings({"rawtypes","unchecked"})
        // 创建新的数组(对于第一次添加元素,那么这个数组就是第一个数组;对于存在oldTab的时候,
            那么这个数组就是要需要扩容到的新数组)
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab; // 将该map的table属性指向到该新数组
    if (oldTab != null) {   // 如果老数组不为空,说明是扩容操作,那么涉及到元素的转移操作
        for (int j = 0; j < oldCap; ++j) { // 遍历老数组
            Node<K,V> e;
            if ((e = oldTab[j]) != null) { // 如果当前位置元素不为空,那么需要转移该元素到新数组
                oldTab[j] = null; // 释放掉老数组对于要转移走的元素的引用(主要为了使得数组可被回收)
                if (e.next == null) // 如果元素没有有下一个节点,说明该元素不存在hash冲突
                    // PS3
                    // 把元素存储到新的数组中,存储到数组的哪个位置需要根据hash值和数组长度来进行取模
                    // 【hash值  %   数组长度】   =    【  hash值   & (数组长度-1)】
                    //  这种与运算求模的方式要求  数组长度必须是2的N次方,但是可以通过构造函数随意指定初始化容量呀,
                      如果指定了17,15这种,岂不是出问题了就?没关系,最终会通过tableSizeFor方法将用户指定的转化为大
                      于其并且最相近的2的N次方。 15 -> 1617-> 32
                    newTab[e.hash & (newCap - 1)] = e;

                    // 如果该元素有下一个节点,那么说明该位置上存在一个链表了(hash相同的多个元素以链表的方式存储到
                        了老数组的这个位置上了)
                    // 例如:数组长度为16,那么hash值为1(1%16=1)的和hash值为17(17%16=1)的两个元素都是会存储
                        在数组的第2个位置上(对应数组下标为1),当数组扩容为321%32=1)时,hash值为1的还应该存储
                        在新数组的第二个位置上,但是hash值为1717%32=17)的就应该存储在新数组的第18个位置上了。
                    // 所以,数组扩容后,所有元素都需要重新计算在新数组中的位置。


                else if (e instanceof TreeNode)  // 如果该节点为TreeNode类型
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);  // 此处单独展开讨论
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;  // 按命名来翻译的话,应该叫低位首尾节点
                    Node<K,V> hiHead = null, hiTail = null;  // 按命名来翻译的话,应该叫高位首尾节点
                    // 以上的低位指的是新数组的 0  到 oldCap-1 、高位指定的是oldCap 到 newCap - 1
                    Node<K,V> next;
                    // 遍历链表
                    do {  
                        next = e.next;
                        // 这一步判断好狠,拿元素的hash值  和  老数组的长度  做与运算
                        // PS3里曾说到,数组的长度一定是2的N次方(例如16),如果hash值和该长度做与运算,结果为0,就
                        说明该hash值一定小于数组长度(例如hash值为1),那么该hash值再和新数组的长度取摸的话,还是
                        hash值本身,所该元素的在新数组的位置和在老数组的位置是相同的,所以该元素可以放置在低位链表
                        中。
                        if ((e.hash & oldCap) == 0) {  
                            // PS4
                            if (loTail == null) // 如果没有尾,说明链表为空
                                loHead = e; // 链表为空时,头节点指向该元素
                            else
                                loTail.next = e; // 如果有尾,那么链表不为空,把该元素挂到链表的最后。
                            loTail = e; // 把尾节点设置为当前元素
                        }

                        // 如果与运算结果不为0,说明hash值大于老数组长度(例如hash值为17)
                        // 此时该元素应该放置到新数组的高位位置上
                        // 例:老数组长度16,那么新数组长度为32,hash为17的应该放置在数组的第17个位置上,也就是下标为
                        16,那么下标为16已经属于高位了,低位是[0-15],高位是[16-31]
                        else {  // 以下逻辑同PS4
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) { // 低位的元素组成的链表还是放置在原来的位置
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {  // 高位的元素组成的链表放置的位置只是在原有位置上偏移了老数组的长度个位置。
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead; // 例:hash为 17 在老数组放置在0下标,在新数组放置在16下标;    
                        hash为 18 在老数组放置在1下标,在新数组放置在17下标;                   
                    }
                }
            }
        }
    }
    return newTab; // 返回新数组
}

复制代码

Reference: blog.csdn.net/maohoo/arti...

5. Thread insecurity

HashMap can be accessed by multiple threads at the same time, it may result in inconsistent data.
Mainly reflected :

在jdk1.7中,在多线程环境下,扩容时会造成环形链或数据丢失。
在jdk1.8中,在多线程环境下,会发生数据覆盖的情况。
复制代码

5.1 collection of methods synchronizedMap

Map<String, String> synchronizedHashMap = Collections.synchronizedMap(new HashMap<String, String>());
复制代码

5.2 ConcurrentHashMap

Map<String, String> concurrentHashMap = new ConcurrentHashMap<>();
复制代码

5.3 Hashtable

Map<String, String> hashtable = new Hashtable<>();
复制代码

Reference: www.cnblogs.com/supiaopiao/...

Guess you like

Origin juejin.im/post/5d464477f265da03b31bb0ee
Recommended