Detailed explanation of HashMap infinite loop problem under multithreading

Hello everyone, I saw a very interesting question when I was reading the technical blog today, as the title shows ------ "In the case of multi-threading, regarding the infinite loop of HashMap, remember that I am When I first learned JavaSE, I saw this problem. At that time, the knowledge reserve was not enough and I didn't study it deeply. Today, I will talk about it in detail, and hope to help you and make progress together.

Start of text:

Java's HashMap is not thread-safe. ConcurrentHashMap should be used in multithreading.

[HashMap] problem under multi-threading (mainly talking about the infinite loop problem here):

1. After the multi-threaded put operation, the get operation causes an infinite loop.
2. After multi-threaded put non-NULL elements, the get operation will get the NULL value.
3. Multi-threaded put operation causes element loss.

1. Why does an endless loop appear in the Jdk7 version?

( Using a non-thread-safe HashMap under multi-threading, single-threading will never appear)

HashMap uses a linked list to resolve Hash conflicts. Because it is a linked list structure, it is easy to form a closed link, so that as long as a thread performs a get operation on this HashMap during the cycle, an endless loop will occur.

In the case of a single thread, only one thread operates on the data structure of the HashMap, and it is impossible to produce a closed loop.

That would only happen in the case of multi-threaded concurrency, that is, when the put operation.
If size>initialCapacity*loadFactor, then the HashMap will perform a rehash operation, and the structure of the HashMap will change drastically. It is very likely that the two threads triggered the rehash operation at this time, resulting in a closed loop.

2. How did it happen:

Store data put():

	public V put(K key, V value)
	{
    
    
		......
		//算Hash值
		int hash = hash(key.hashCode());
		int i = indexFor(hash, table.length);
		//如果该key已被插入,则替换掉旧的value (链接操作)
		for (Entry<K,V> e = table[i]; e != null; e = e.next) {
    
    
			Object k;
			if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
    
    
				V oldValue = e.value;
				e.value = value;
				e.recordAccess(this);
				return oldValue;
			}
		}
		modCount++;
		//该key不存在,需要增加一个结点
		addEntry(hash, key, value, i);
		return null;
	}

When we put an element in the HashMap, we first get the position (namely the subscript) of this element in the array according to the hash value of the key, and then we can put this element in the corresponding position.
If there are other elements already stored at the location of this element, then the elements at the same position will be stored in the form of a linked list, with newly added elements at the head of the chain, and previously added elements at the end of the chain.

Check whether the capacity exceeds the standard addEntry:

	void addEntry(int hash, K key, V value, int bucketIndex)
	{
    
    
		Entry<K,V> e = table[bucketIndex];
		table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
		//查看当前的size是否超过了我们设定的阈值threshold,如果超过,需要resize
		if (size++ >= threshold)
			resize(2 * table.length);
	}

If the size now exceeds the threshold, then a resize operation is required, a new hash table with a larger size is created, and the data is migrated from the old Hash table to the new Hash table.

Adjust the Hash table size resize:

	void resize(int newCapacity)
	{
    
    
		Entry[] oldTable = table;
		int oldCapacity = oldTable.length;
		......
		//创建一个新的Hash Table
		Entry[] newTable = new Entry[newCapacity];
		//将Old Hash Table上的数据迁移到New Hash Table上
		transfer(newTable);
		table = newTable;
		threshold = (int)(newCapacity * loadFactor);
	}


When the capacity of the table[] array is small, hash collisions are likely to occur, so the size and capacity of the Hash table are very important.

Generally speaking, when there is data to be inserted into the Hash table container, it will check whether the capacity exceeds the set threshold. If it exceeds, the size of the Hash table needs to be increased. This process is called resize.
When multiple threads add new elements to the HashMap at the same time, multiple resizes will have a certain probability of an infinite loop, because each resize needs to map the old data to the new hash table. This part of the code is in the HashMap#transfer() method. as follows:

	void transfer(Entry[] newTable)
	{
    
    
		Entry[] src = table;
		int newCapacity = newTable.length;
		//下面这段代码的意思是:
		//  从OldTable里摘一个元素出来,然后放到NewTable中
		for (int j = 0; j < src.length; j++) {
    
    
			Entry<K,V> e = src[j];
			if (e != null) {
    
    
				src[j] = null;
				//以下代码是造成死循环的罪魁祸首。
				do {
    
    
					Entry<K,V> next = e.next;//取出第一个元素
					int i = indexFor(e.hash, newCapacity);
					e.next = newTable[i];
					newTable[i] = e;
					e = next;
				} while (e != null);
			}
		}
	}

3. Graphical HashMap infinite loop:

Normal ReHash process (single thread): It is
assumed that our hash algorithm is simply the size of the table with key mod (that is, the length of the array).
The top one is the old hash table, in which the size of the hash table is 2, so the key = 3, 7, 5, after mod 2 all conflict in table[1].

The next three steps are the process of resize the Hash table to 4, and then rehash all <key, value>.

Insert picture description here

Rehash under concurrency (multithreaded)

Suppose we have two threads.

	do {
    
    
		Entry<K,V> next = e.next; // <--假设线程一执行到这里就被调度挂起了,执行其他操作
		int i = indexFor(e.hash, newCapacity);
		e.next = newTable[i];
		newTable[i] = e;
		e = next;
	} while (e != null);

And our thread two execution is complete. So we have the following look:

Insert picture description here
Note that because Thread1's e points to key(3) and next points to key(7), after thread two rehash, it points to the linked list after thread two reorganization. We can see that the order of the linked list is reversed. Here, thread one becomes the HashMap after the operation of thread two.

2) Once the thread is scheduled to come back for execution.
First, execute newTalbe[i] = e;
and then e = next, which causes e to point to key(7),
and the next cycle of next = e.next causes next to point to key(3).

Insert picture description here
3) Everything is ok.

The thread continues to work. Pick off key(7), put it in the first one of newTable[i], and move e and next down. There are already other elements stored in the position of this element, then the elements in the same position will be stored in the form of a linked list, the newly added elements are placed at the head of the chain, and the previously added elements are placed at the end of the chain.

Insert picture description here

4) The circular link appears.

e.next = newTable[i] causes key(3).next to point to key(7).
Note: At this time, key(7).next already points to key(3), and the circular linked list appears.

Insert picture description here
So, when our thread called HashTable.get(11), the tragedy appeared- Infinite Loop .

After JDK8 -> Expansion to solve the infinite loop

JDK8 modified the structure of HashMap, and changed the original linked list part to the linked list when there is less data. When it exceeds a certain amount, it is transformed into a red-black tree. Here we mainly discuss the difference between the linked list and the previous one.

1. If the size of oldtab is 2, there are two nodes 7, 3 inside, because 2> 2 * 0.75, now we need to expand to newtable of size 4, and then move the node from oldtable to newtable, assuming there are two threads here , There is e in each thread, next records the current node and the next node respectively, and now the two threads expand together, something will happen

Insert picture description here
Insert picture description here

Through the above analysis, it is not difficult to find that the cycle is generated because the order of the new linked list is completely opposite to the old linked list, so as long as the new chain is built in the original order, the cycle will not occur.

JDK8 uses head and tail to ensure that the order of the linked list is the same as before, so that no circular references are generated.

summary:

The difference between before and after jdk1.8 is that after jdk1.8, the node is directly placed at the end node of newtable[j], while before jdk1.8, it is placed directly on the head node. Although the infinite loop is solved, hashMap still has many problems in multi-threaded use. It is better to use ConcurrentHashMap in multi-threaded mode.

Guess you like

Origin blog.csdn.net/m0_46405589/article/details/109206432