HashMap thread insecurity problem and solution

HashMap thread insecurity problem and solution

We all know HashMapthat is thread-unsafe and we should use it ConcurrentHashMap. But why HashMapis it thread-unsafe?

Let me declare first: HashMapthread insecurity will cause problems like ** 死循环, 数据丢失, ** now. 数据覆盖Among them, the infinite loop and data loss are problems that appeared in JDK1.7 and have been solved in JDK1.8. However, there will still be problems such as data overwriting in 1.8.

Analysis of Infinite Loop and Data Loss Caused by JDK1.7 Expansion

The thread insecurity of HashMap mainly occurs in the expansion method, that is, the root cause is in the transfer method . The transfer method of HashMap in JDK1.7 is as follows:

void transfer(Entry[] newTable, boolean rehash) {
    
    
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
    
    
            while(null != e) {
    
    
                Entry<K,V> next = e.next;
                if (rehash) {
    
    
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }

This code is HashMapthe expansion operation, relocating the subscript of each bucket, and using the head interpolation method to migrate the elements to the new array. The head insertion method will reverse the order of the linked list, which is also the key point of forming an infinite loop. After understanding the header insertion method, continue to look down how to cause an infinite loop and data loss.

Assume that there are now two threads A and B HashMapperforming expansion operations on the following at the same time:

image-20220909190803488

The result after normal expansion is as follows:

image-20220909190812666

But when thread A executes to transferline 11 of the above function, the CPU time slice is exhausted, and thread A is suspended. That is, as shown in the figure below:

image-20220909190831389

At this time in thread A: e=3, next=7, e.next=null

image-20220909190843876

When the time slice of thread A is exhausted, the CPU starts to execute thread B, and successfully completes the data migration in thread B

image-20220909190855912

According to the Java memory model, after thread B executes the data migration, newTable and table in the main memory are the latest, that is to say: 7.next=3, 3.next=null.

Then thread A obtains the CPU time slice and continues to execute newTable[i] = e, and puts 3 into the position corresponding to the new array. After executing this cycle, the situation of thread A is as follows:

image-20220909191034760

Then continue to execute the next cycle, at this time e=7, when reading e.next from the main memory, it is found that 7.next=3 in the main memory, so next=3 , and put 7 in the way of head insertion into a new array, and continue to execute this cycle, the result is as follows:

image-20220909191337807

Executing the next cycle can find that next=e.next=null, so this cycle will be the last cycle. Next, after executing e.next=newTable[i], that is, 3.next=7, 3 and 7 are connected to each other. After executing newTable[i]=e, 3 is reinserted into In the linked list, the execution result is shown in the figure below:

image-20220909191412668

It is said above that e.next=null means next=null at this time. After executing e=null, the next cycle will not be performed. At this point, the expansion operations of threads A and B are completed. Obviously, after the execution of thread A, a ring structure appears in the HashMap, and an infinite loop will appear when the HashMap is operated in the future .

And from the figure above, we can see that element 5 was inexplicably lost during the expansion, which caused the problem of data loss .

Thread unsafe in JDK1.8

The problems in JDK1.8 have been solved in JDK1.8. If you read the source code of 1.8, you will find that transferthe function cannot be found, because JDK1.8 directly resizecompletes the data migration in the function. In addition, JDK1.8 uses the tail insertion method when inserting elements (the order will not go wrong).

But there will be data overwriting in JDK1.8

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    
    
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null) // 如果没有hash碰撞则直接插入元素
            tab[i] = newNode(hash, key, value, null);
        else {
    
    
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
    
    
                for (int binCount = 0; ; ++binCount) {
    
    
                    if ((e = p.next) == null) {
    
    
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) {
    
     // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

The following analyzes from two situations

  • Assuming that two threads A and B are performing put operations, and the insertion subscript calculated by the hash function is the same, when thread A executes the sixth line of code of the putVal method (to determine whether there is a hash collision ), due to the time slice As a result, it is suspended, and thread B inserts the element at the subscript after obtaining the time slice, and completes the normal insertion, and then thread A obtains the time slice. Since the hash collision judgment has been made before, it will not Then judge, but insert directly, which causes the data inserted by thread B to be overwritten by thread A, so the thread is not safe.
  • Moreover, the size parameter of the putVal method is used to determine the number of elements in the HashMap. Under multi-threading, when threads A and B perform put operations at the same time, assuming that the current size of the HashMap is 10, when thread A executes **size++**, the value of size obtained from the main memory is 10 and the +1 operation is ready to be performed, but due to the exhaustion of the time slice, the CPU has to be given up. Thread B happily gets the CPU or gets the size from the main memory The value of 10 is +1 operation, the put operation is completed and size=11 is written back to the main memory, and then thread A gets the CPU again and continues to execute (the value of size is still 10 at this time), when the put operation is completed, Still write size=11 back to the memory. At this time, both threads A and B have performed a put operation, but the value of size has only increased by 1. It is said that the thread is not safe due to data overwriting.

Summarize

HashMapThe thread insecurity is mainly reflected in the following two aspects:
1. In JDK1.7, when the expansion operation is performed concurrently, it will cause a ring chain and data loss.
2. In JDK1.8, data overwriting occurs when the put operation is executed concurrently.

Thread-unsafe solution

Hashtable (obsolete)

In order to achieve multi-thread safety, HashTable adds synchronized locks to almost all methods (the lock is the instance of the class, that is, the entire map structure). When a thread accesses the synchronization method of Hashtable, other threads also need to access Synchronous methods will be blocked.

This solution is not used very much, so I will not explain it here

Collections.synchronizedMap (generally not used)

Collections.synchronizedMap() returns a new Map implementation

Map<String,String> map = Collections.synchronizedMap(new HashMap<>());

When we call the above method, we need to pass in a Map. As you can see in the figure below, there are two constructors. If you pass in the mutex parameter, the object exclusion lock is assigned to the passed in object.
If not, assign the object exclusion lock to this, that is, the object that calls synchronizedMap, which is the above Map

image-20220909193330432

Collections.synchronizedMap() to encapsulate all unsafe HashMap methods

image-20220909193412113

There are two key points of encapsulation
1) use the classic synchronized for mutual exclusion
2) use the proxy mode to create a new class, which also implements the Map interface. On the Hashmap, synchronized locks the object, So the first one to apply for the lock, other threads will enter the block, waiting to wake up

Advantages: the code implementation is very simple, you can understand it at a glance

Disadvantages: From the perspective of locking, it basically locks the largest possible code block. The performance will be relatively poor

ConcurrentHashMap (commonly used)

In JDK 1.7 , the segment lock mechanism is adopted to realize concurrent update operations. The bottom layer adopts the storage structure of array + linked list, including two core static internal classes Segment and HashEntry.

①. Segment inherits ReentrantLock (reentrant lock) to act as a lock. Each Segment object guards several buckets of each hash mapping table; ②.
HashEntry is used to encapsulate the key-value pairs of the mapping table;
③. Each A bucket is a linked list linked by several HashEntry objects

Segment lock : In the Segment array, a Segment object is a lock, which corresponds to a HashEntry array. The data synchronization in this array depends on the same lock, and the reading and writing of different HashEntry arrays do not interfere with each other.

In JDK 1.8 , the original Segment segment lock was abandoned to ensure the use of Node + CAS + Synchronized to ensure concurrency security. Cancel the Segment class and directly use the table array to store key-value pairs; when the length of the linked list composed of Node objects exceeds TREEIFY_THRESHOLD, the linked list is converted into a red-black tree to improve performance. The bottom layer is changed to array + linked list + red-black tree.

CAS performance is very high, but synchronized has always been a heavyweight lock before, jdk1.8 introduced synchronized, using the way of lock upgrade.

For the synchronized way of acquiring locks, JVM uses an optimization method of lock upgrade, which is to use a biased lock to give priority to the same thread and then acquire the lock again. If it fails, it will be upgraded to a CAS lightweight lock. If it fails, it will spin for a short time to prevent The thread is suspended by the system. Finally, if all of the above fail, upgrade to a heavyweight lock.

Biased lock: In order to minimize unnecessary lightweight lock execution without multi-thread competition, there is always only one thread executing the synchronization block, and no other thread performs synchronization until it finishes executing the lock release.
Lightweight lock: When there are two threads competing, it will be upgraded to a lightweight lock. The main purpose of introducing lightweight locks is to reduce the performance consumption of traditional heavyweight locks using operating system mutexes without multi-thread competition.
Heavyweight lock: In most cases, at the same time point, there are often multiple threads competing for the same lock. In the pessimistic lock mode, threads that fail to compete will constantly switch between blocked and awakened states, which is relatively expensive.

Due to limited space, the source code analysis of ConcurrentHashMap will not be shown here, and the blog with detailed content will be added later when there is time.

Guess you like

Origin blog.csdn.net/m0_61820867/article/details/126827803