[Java]-HashMap source code analysis

Preface

The underlying structure of HashMap is array + linked list, that is, the chain address method is used to solve hash conflicts. Each element of the array is a linked list, and the linked list stores a set of elements with equal hash values. The commonly used methods of this structure are put() and get()

some static constants

//默认初始化的数组的大小,即当用户构造HashMap没有指定数组大小时使用;容量必须为2的n次幂
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
//最大的数组大小,当用户构造HashMap时如果指定的大小超过了这个值,就会以这个值作为数组的大小(必须是2的n次幂,如果1 << 31就是负数了)
static final int MAXIMUM_CAPACITY = 1 << 30;
//默认的负载因子
static final float DEFAULT_LOAD_FACTOR = 0.75f;
//当单个链表上节点达到8个的时候就将链表转化为红黑树
static final int TREEIFY_THRESHOLD = 8;
static final int UNTREEIFY_THRESHOLD = 6;
static final int MIN_TREEIFY_CAPACITY = 64;

Why load factor is 0.75 by default

Officially, 0.75 is a good balance between space and time. If the load factor is too high, such as 1.0, due to the existence of hash collisions, when the expansion threshold is reached, some buckets have already stored many elements, making the length of the linked list or the height of the red-black tree relatively high. Utilization has increased, but because there are too many elements stored in some buckets, the time efficiency of operations such as querying or inserting has decreased.

And if the load factor is too low, such as 0.5, it is equivalent to expanding the entire space by occupying only half of it. Although the number of elements in a bucket will be relatively small and the query efficiency will be relatively high, it will waste storage space and reduce space. The utilization rate will also increase the frequency of capacity expansion.

When 0.75 is selected, the time efficiency and space efficiency will not be much different, and a good balance can be achieved, so 0.75 is selected as the default load factor.

Node Node class (Entry)

static class Node<K,V> implements Map.Entry<K,V> {
    
    
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
    
    
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        {
    
     return key; }
    public final V getValue()      {
    
     return value; }
    public final String toString() {
    
     return key + "=" + value; }

    public final int hashCode() {
    
    
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
    
    
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
    
    
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
    
    
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

The Node class overrides the hashCode method. The calculation method is to perform an XOR operation between the hash code of key and the hash code of value.

Member variables

/*
   存放数据的数组,在第一次被使用的时候被初始化;而且在必要的时候会被重新调整大小
   长度必须是2的N次幂
 */
transient Node<K,V>[] table;
/*
   存放键值对数据的集合
 */
transient Set<Map.Entry<K,V>> entrySet;
/*
   map中存放的键值对的数目
 */
transient int size;
/*
   map结构被修改的次数
 */
transient int modCount;
/*
   下次扩容的阙值,即当size大小达到这个值的时候就进行扩容。值等于数组长度乘以负载因子(capacity * load factor)
*/
int threshold;

/*
   哈希表的负载因子
 */
final float loadFactor;

Regarding the modCount field, you can see the content in ArrayList source code analysis.

hash method

First of all, it is necessary to distinguish between hashCode and hash. hashCode is a method in the Object class, used to calculate the consistent hash code of the object. If it is not overridden, it is calculated as implemented in the Object class. The calculation of the
object hash value in HashMap is based on the value of hashCode, through hash method is implemented as follows:

static final int hash(Object key) {
    
    
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

What is calculated is the hash value hash of the object in the map.
When calculating the subscript index of the storage position of the object in the table , that is, which bucket is stored in the table, the hash value is bitwise ANDed with the table capacity -1 Obtained by the operation , see the subsequent code for details

It can be seen that in the hash method, when the key is null, it will return 0, that is to say, HashMap allows storing null keys , and its hash value is 0; when the key is not null, its hashCode and hashCode are unsigned right-shifted by 16 When the hashCode is 0~65535, the upper 16 bits are all 0, then the result obtained after unsigned right shift 16 bits is 32 bits are all 0, then the same as the original hashCode The result obtained after XOR is still the original hashCode. Only if the hashCode is greater than 65535, the final result will be different.

We know that a good hash algorithm needs the calculation results to be dispersed enough and even enough, and this algorithm here uses the hash code of the key to ensure that the hash results can be dispersed

Construction method

Constructor specifying initial capacity and load factor

Create an empty HashMap and specify the initial capacity and load factor:

public HashMap(int initialCapacity, float loadFactor) {
    
    
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

Since the expansion threshold must be the nth power of 2, it is necessary to obtain an nth power of 2 as the threshold for the next expansion based on the initial capacity specified by the user, and return it through the method tableSizeFor:

static final int tableSizeFor(int cap) {
    
    
    int n = cap - 1;
    n |= n >>> 1;   //1
    n |= n >>> 2;   //2
    n |= n >>> 4;   //3
    n |= n >>> 8;   //4
    n |= n >>> 16;  //5
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

The range of cap is [0,MAXIMUM_CAPACITY]. Below, we will not consider the case where n is -1 or 0.
Consider the role of the five statements 1, 2, 3, 4, and 5 when n > 0. n must have at least one 1. We only look at the highest 1. Suppose n is 0001xxxx. n >>> 1 is equivalent to shifting the highest 1 to the right by one bit to get 00001xxx. Then bitwise OR will make The highest bit of n and the bit to the right of the highest bit are both 1, and you get 00011xxx; then shift right by 2 bits, and you get 0000011x, and then bitwise AND, you get 0001111x; if you continue, you will eventually get n as 00011111... It can be seen that the n finally obtained after statement 5 must be based on the original n, with all the bits after the highest 1 changed to 1, and the final returned value will be n + 1, which is greater than the original The first n value of n is 2 raised to the nth power , so when cap is 2 raised to the nth power, the value of cap is ultimately returned. That is, the method ultimately returns the first nth power of 2 that is greater than or equal to cap.

When cap is equal to 0 and n=-1, the highest bit of its complement is 1, then the highest bit of n after the operation is still 1, that is to say, it is still a negative number. According to the last return statement, the last n is a negative number, then the method returns 1

When cap is 1 and n=0, n is still 0 after operation, and the final return value is still 1.

To summarize, if the user specifies the initial capacity, then the construction method will calculate the first n power of 2 that is greater than or equal to the initial capacity, and then assign this number to the threshold, which is the threshold value for the next expansion, instead of directly creating the table. The array will not be expanded until the first put, and a table array with a capacity of threshold will be created. This is a lazy idea. What are the benefits? Assume that the user constructs this HashMap, and then the construction method also constructs the actual An array that stores data, but the user does not use it, that is, there is no reference variable pointing to this HashMap, then this HashMap will be garbage collected later, which is equivalent to the space being allocated but not used, and it will be recycled. Memory space is wasted, and running resources during allocation are also wasted. Therefore, with lazy allocation, you can save useless overhead by waiting until the user actually needs to use the space.

Constructor specifying initial capacity

The user specifies the initial capacity, and the load factor uses the default 0.75

public HashMap(int initialCapacity) {
    
    
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

No-argument constructor

/**
 * Constructs an empty <tt>HashMap</tt> with the default initial capacity
 * (16) and the default load factor (0.75).
 */
public HashMap() {
    
    
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

get method

public V get(Object key) {
    
    
    Node<K,V> e;
    //1.调用getNode方法
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
    
final Node<K,V> getNode(int hash, Object key) {
    
    
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    //2.根据哈希值找到key存在数组哪个索引位置上
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
    
    
        //3.检查该索引上的第一个节点(链表即头节点,红黑树即根节点)是不是要找的key,是的话返回
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
    
    
        	//4.否则看现在数组元素是链表还是红黑树,如果是红黑树,就调用getTreeNode方法继续搜索
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
    
    
            	//5.否则就继续遍历链表查找key
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

put method

The put method adds a key-value pair. If the specified key already exists in the original HashMap, it will replace the value corresponding to the key and return the value corresponding to the key in the original map. If there is no such key, return null; of course return null It may also be that this key exists in the original map and its corresponding value is null.

HashMap allows key-value pairs with null values, so there will be ambiguity problems. If the get method is called on a key, but null is returned, then the key does not exist or the value corresponding to the key is null. Can't tell. The solution is to first use the contains method to determine whether the key exists, and then use the get method to obtain the value if it exists. But this method also has the problem of thread insecurity. It is possible to judge the existence of this key in the contains method, and then delete it before get. At this time, the return of get must be null, so we will definitely think that the key corresponds to value is null, but it is possible that the value corresponding to this key is not null, but is returned because it was deleted.

public V put(K key, V value) {
    
    
	//1.调用putVal方法
    return putVal(hash(key), key, value, false, true);
}

/**
 * hash,key的哈希值
 * key,目标key
 * value,key对应的值
 * onlyIfAbsent 该值如果传入true,就不修改原先存在的值
 * evict,该值如果传入false,存数据的table数组就进入creation模式
 * 
 * 方法返回值返回原有key对应的值,若不存在key则返回null
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    
    
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    //2.如果table为null或者table数组长度为0,就调用resize方法进行扩容
    //(构造方法没有初始化table数组,推迟到了第一次put的时候才会为table数组分配空间)
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    //3.如果key哈希值所对应的数组索引位置为null,就直接创建一个新节点
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
    
    
        Node<K,V> e; K k;
        //4.否则就先看索引位置上的首节点的key是不是指定的key
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k)))) //可以看出HashMap是允许null为键的
            e = p;
        //5.如果不是,就判断这个索引上的数组元素是不是红黑树,是的话就调用putTreeVal继续完成插入操作
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        //6.如果不是红黑树那就是链表,遍历链表查看有没有存在key的节点,没有的话就创建一个节点在尾部,有的话就返回找到的节点
        else {
    
    
            for (int binCount = 0; ; ++binCount) {
    
    
                if ((e = p.next) == null) {
    
    
                    p.next = newNode(hash, key, value, null);
                    //如果会找到null,此时链表节点个数为binCount + 1,所以判断该链表是否达到树化阙值
                    //就是判断binCount + 1 >= TREEIFY_THRESHOLD(8)
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        //7.最后e如果不是null,说明原先map中就存在key的键值对,当onlyIfAbsent或者键值对中原先的值为null,就把e的value属性改为指定的value,然后返回旧值
        if (e != null) {
    
     // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    //8.如果没有找到原先有key的键值对,就会新增一个节点,那就要判断新增完键值对数目size是不是大于要扩容的阙值,是就扩容
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

putIfAbsent

@Override
public V putIfAbsent(K key, V value) {
    
    
    return putVal(hash(key), key, value, true, true);
}

You can see that the difference from the put method is that when calling the putVal method, the fourth parameter is adjusted to true, which means that the original value will not be modified (if the key is originally stored)

resize method

final Node<K,V>[] resize() {
    
    
    Node<K,V>[] oldTab = table;
    //旧数组的长度
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    //旧数组的扩容阙值
    int oldThr = threshold;
    int newCap, newThr = 0;
    //非第一次扩容
    if (oldCap > 0) {
    
    
    	//如果数组的长度已经达到了MAXIMUM_CAPACITY,那么就不应该再扩容了
        if (oldCap >= MAXIMUM_CAPACITY) {
    
    
        	//直接将下一次扩容阙值设为整型最大值,这个值肯定小于oldCap * loadFaactor,意思就是以后不会再经历扩容操作了
            threshold = Integer.MAX_VALUE;
            //无需扩容,直接返回
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    //oldCap不大于0,所以是第一次扩容
    else if (oldThr > 0) //当oldThr即threshold大于0时,表示的就是根据用户指定的初始容量计算得到的初始容量
        newCap = oldThr;
    else {
    
      //oldThr为0,说明是使用无参构造函数构造的map,那么要扩容的大小就使用newCap默认的DEFAULT_INITIAL_CAPACITY
        newCap = DEFAULT_INITIAL_CAPACITY;
        //下次扩容的阙值就是当前容量table数组大小乘以负载因子
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    //检查下次扩容阙值newThr是否未被修改
    if (newThr == 0) {
    
    
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    //正式开始扩容
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    //迁移旧的记录
    if (oldTab != null) {
    
    
    	//遍历旧的table数组中每一个桶(即每个table[i])
        for (int j = 0; j < oldCap; ++j) {
    
    
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
    
    
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else {
    
     
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
    
    
                        next = e.next;
                        //计算元素要分配的桶
                        if ((e.hash & oldCap) == 0) {
    
    
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
    
    
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
    
    
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
    
    
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

Redistribution of elements

If the elements on the bucket are organized in the form of a linked list, then when the elements are reallocated, the hash value of the element will be ANDed with the old array size. We know that the array size is an integer power of 2, that is, only One bit is 1, then if the hash value of the element is also 1 in the bit where the binary value of the array size is 1, then the AND result is the old array size, otherwise it is 0, so according to this characteristic, if If the result is 0, leave the element in the original bucket. Otherwise, put the element in the new bucket. The new bucket corresponding to the i-th bucket is the i-th bucket plus the size of the old array. In this case, since the array size is 2 Double expansion, so each new bucket expanded actually corresponds to each bucket in the old array, thus completing the redistribution of elements in the old bucket.

Why the size of table array must be an integer power of 2

The source code is used to (n - 1) & hashcalculate which index position the element should be placed in the table array. When n is an integer power of 2, n - 1 is in the format of 0000...111111. Assume that the first k bits are 1, then According to the bitwise AND calculation method, the result of (n - 1) & hash calculation is the first k bits of hash. This algorithm makes the results obtained when calculating hash of different elements evenly distributed , that is, on the table array The distribution is uniform , which is also one of the requirements of a good hash algorithm. The results obtained by hashing different elements should be uniform.

And we know that bit operations are the fastest operations when computers perform operations . Generally, when selecting the location of elements in a hash table, hash % n is used, where n is the size of the hash table (array). This is In order to calculate the index position must be within the range of the array size n. When n is an integer power of 2, there is hash % n = hash & (n - 1), which not only ensures that the calculated index position must be within the range of n, but also has fast calculation speed and evenly distributed calculation results.

Proof: When n is the n power of 2, hash % n = hash & (n - 1)

I searched and searched but couldn't find why when n is the nth power of 2, there is hash % n = hash & (n - 1), so I tried to prove it myself:

In binary, assuming hash is an m-bit binary number, then there is

hash = a020 + a121 + a222 + … + ak2k + … + am-12m-1

Where a 0 , a 1 , a 2 , . . . are equal to 0 or 1. Since n is an integer power of 2, assume n = 2 k (k is a non-negative integer and m - 1 > k). Further to the hash, you can get:

hash = a020 + a121 + a222 + ak-12k-1 + 2k(ak + ak+12 + … + am-12m-k-1)

Even if a 0 to a k-1 are all 1, the sum of the first k items is 2 k - 1 < 2 k , so the result of hash % 2 k , that is, hash % n, is the sum of the 0th to k - 1th items, that is That is, the first k bits under the hash binary

Looking at hash & (n - 1) again, since n = 2 k , it means that n is 1000...00000, where 1 is the k + 1th bit (counting from the highest position to the highest position), and the first k bits are all 0, then n - 1 That is 0111...11111, the first k bits are all 1, and other bits are all 0, then according to the characteristics of bitwise AND operation, the result of hash & (n - 1) is the first k bits of hash

To sum up, hash % n = hash & (n - 1) , the results are the top k bits of hash

So why the array size n must be the nth power of 2 can also be understood in this way: usually the calculation element storage position is calculated by hash % n, because when n is the nth power of 2, there is hash % n = hash & ( n - 1) For this relationship, in order to use the bitwise AND operation to improve efficiency, it is stipulated that n must be the nth power of 2, and then hash & (n - 1) can be used to calculate the storage position of the element

About hashCode() and equals() rewriting

  1. The implementation of the equals method in Object is to directly use the "==" symbol to compare two objects, that is, to compare whether their reference addresses are equal, that is, whether the two objects are the same object. As long as they are not the same object, then return The result is always false.
    In the String class, the comparison rule for whether two objects are equal is whether the string contents in the two objects are the same. It is entirely possible that the string contents of two objects with different addresses are the same. Use the unrepeatable method directly. If the equals method written compares two objects like this, it will always return false. This will not achieve the role of equals in this class, so it needs to be rewritten.
  2. So as to whether these two methods need to be rewritten, simply put, when they need to be used, and the purpose of use is different from the original implementation in the Object method, then rewrite it. If you need to use the hashCode method, just use it according to To rewrite your own intention, if you need to use the equals method, just rewrite the equals method.
  3. Since the hashCode method returns the hash code of the object, generally speaking, if two objects are the same, that is, the return value of the equals method is true, then they also hope that their hashCode is equal. In these hash table structures of HashMap, they
    are First calculate its index value on the hash table through hashCode (actually a hash value, the value obtained after processing hashCode), and then call the equals method on the conflicting element at the index position to further deduplicate. If only the equals method is rewritten Without overriding the hashCode method, it may happen that when two objects call the equals method, the return value is true, but their hashCode is different. That is, the hash conflict that should have occurred may not have occurred, resulting in two equal objects. appeared in the collection
  4. The combined use of these two methods is generally inseparable from collections, so there is a suggestion to rewrite the hashCode method when rewriting the equals method

Why both hashCode and equals

It is true that it is possible to use equals to determine whether two elements are the same. It is applicable to Map, Set and other sets that need to be judged.

However, if you only use equals to determine whether n elements are in the set, you need to call the equals method n times; and if there is a hashCode method, first use hashCode to locate the bucket where the element is located, and narrow down possible conflicting elements to this bucket Within the range, the equals comparison of the elements in the bucket can greatly reduce the number of calls to equals, and the overhead of equals is often larger than hashCode, so reducing the number of equals can also significantly improve the efficiency of the entire collection.

Why does the linked list need to be converted into a red-black tree when its length is greater than 8?

According to the developer's comments in the source code, since the size of a red-black tree node is about twice that of an ordinary node, it will only be converted into a red-black tree when a hash bucket reaches a threshold (i.e., the tree threshold TREEIFY_THRESHOLD). structure

Under a hashing method with good distribution characteristics, there are very few cases where trees need to be used: ideally, with the help of each node's own random hash code, the probability distribution of the number of nodes in a hash bucket conforms to Poisson In a loose distribution, the probability that the number of nodes in a single bucket is 8 is only 0.00000006, which is already very small. So I chose 8 as the tree threshold

When the number of nodes is not enough, using a linked list is fast enough to complete the query. There is no need to use a red-black tree, and the node memory space of the red-black tree is much larger than that of ordinary nodes. The priority is to select the linked list

Can't you use red-black tree directly?

Why use a linked list when the number of entries in a single bucket is small, instead of using a red-black tree, or a binary search tree, or AVL, which can improve search efficiency from the beginning. This should be due to the trade-off between time and space

When there are few hash conflicts, or when there are few entries in a single bucket, the time-saving effect of using red-black tree is not particularly large, and the efficiency may decrease during put, and it may Put needs to perform complex color changing, rotation and other operations; in addition, each node of the tree will take up more space, which is a bit outweighed by the time saved.

Thread unsafe issues

In JDK 1.7 and before, HashMap's thread insecurity is reflected in the two aspects of linked list dead loop and data coverage when expanding ; starting from JDK 1.8, the problem of dead loop has been fixed, but the problem of data coverage still exists

infinite loop

The problem of infinite loop originates from the transfer() method in JDK 1.7 and before. This method is used to migrate data from the linked list on a certain bucket during capacity expansion:

void transfer(Entry[] newTable, boolean rehash) {
    
    
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
    
    
        while(null != e) {
    
    
            Entry<K,V> next = e.next;
            if (rehash) {
    
    
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            int i = indexFor(e.hash, newCapacity);
            e.next = newTable[i]; //1
            newTable[i] = e;      //2
            e = next;
        }
    }
}

When two threads execute statement 1 at the same time, one thread is suspended for some reason, and the other thread continues to execute normally until statement 2 is executed, and then is also suspended; at this time, the next field of the currently moved node It has pointed to the head node of the new bucket, and the head node of the new bucket is also the moved node. When the first thread resumes execution, it will point the next field of the moved node to the head node of the new bucket. It's myself, so it creates an infinite loop on this node. When subsequent threads need to traverse the linked list where the node is located, they will fall into an infinite loop state, causing a certain degree of deadlock.

After 1.8, the tail insertion method was changed when migrating nodes on a certain bucket, thus solving the problem of the infinite loop of the head node.

data coverage

Since the put method is thread-unsafe, two user threads may put the same key at the same time, so that the operation of one thread will be overwritten

Another one is to judge that there is no key by containsKey first, and then put it, but it may be put by another thread before the put after the containsKey judges that there is no key, which will also cause a logical error

Thread-safe ConcurrentHashMap

HashMap was designed to be implemented in a single-threaded environment, so it is unsafe in a multi-threaded concurrent environment. If multi-threaded concurrency security is required, we should choose ConcurrentHashMap

For ConcurrentHashMap, we need to discuss it in two situations, one is JDK 1.7 and previous versions, and the other is JDK 1.8 and later versions.

Before JDK 1.7, the bottom layer of ConcurrentHashMap is based on the structure of array + linked list. Segment locks are used to ensure thread safety. The array is divided into 16 segments, and each segment is assigned a lock, so a maximum of 16 threads are allowed. Concurrent operations on the same ConcurrentHashMap object

In JDK 1.8 and later, ConcurrentHashMap also introduced the red-black tree structure. At the same time, segmented locks are no longer used, but the CAS + synchronized keyword is used to achieve more fine-grained locks, with the granularity being at the hash bucket level.
Specifically, when putting a new kv, the corresponding bucket is calculated. If the bucket is empty, CAS is used to modify the value; if the bucket is not empty, the head node is locked, and then the subsequent operations are completed. Moreover, the method will loop the entire logic until the value is successfully saved, so if CAS fails, it will eventually reach the locking step. In short, if the bucket is empty, use CAS to modify it. Otherwise, lock the head node and modify it to achieve bucket-level concurrency.

Guess you like

Origin blog.csdn.net/Pacifica_/article/details/123450370