HashMap multithreading problem analysis - normal and abnormal rehash1 (Ali) HashMap multithreading Analysis

 

Multithreading put may lead to get an infinite loop

Past our Java code uses a HashMap for some reason this thing, but at the time the program is single-threaded, everything is no problem. Later, our application performance problems, you need to become a multi-threaded, so multiple threads to become after the line, found the program often accounted for 100% of the CPU, to see the stack, you will find programs Hang in a HashMap .get () method on this, restart the program after the problem goes away. But over time will come. Moreover, the problem in a test environment can be difficult to reproduce.

We simply look at our own code, we know HashMap are multiple threads operating. The document said that Java HashMap is not thread-safe, should ConcurrentHashMap. But here we can study why. Simple code is as follows:

package com.king.hashmap;

import java.util.HashMap;

public class TestLock {

    private HashMap map = new HashMap();

    public TestLock() {
        Thread t1 = new Thread() {
            public void run() { for (int i = 0; i < 50000; i++) { map.put(new Integer(i), i); } System.out.println("t1 over"); } }; Thread t2 = new Thread() { public void run() { for (int i = 0; i < 50000; i++) { map.put(new Integer(i), i); } System.out.println("t2 over"); } }; Thread t3 = new Thread() { public void run() { for (int i = 0; i < 50000; i++) { map.put(new Integer(i), i); } System.out.println("t3 over"); } }; Thread t4 = new Thread() { public void run() { for (int i = 0; i < 50000; i++) { map.put(new Integer(i), i); } System.out.println("t4 over"); } }; Thread t5 = new Thread() { public void run() { for (int i = 0; i < 50000; i++) { map.put(new Integer(i), i); } System.out.println("t5 over"); } }; Thread t6 = new Thread() { public void run() { for (int i = 0; i < 50000; i++) { map.get(new Integer(i)); } System.out.println("t6 over"); } }; Thread t7 = new Thread() { public void run() { for (int i = 0; i < 50000; i++) { map.get(new Integer(i)); } System.out.println("t7 over"); } }; Thread t8 = new Thread() { public void run() { for (int i = 0; i < 50000; i++) { map.get(new Integer(i)); } System.out.println("t8 over"); } }; Thread t9 = new Thread() { public void run() { for (int i = 0; i < 50000; i++) { map.get(new Integer(i)); } System.out.println("t9 over"); } }; Thread t10 = new Thread() { public void run() { for (int i = 0; i < 50000; i++) { map.get(new Integer(i)); } System.out.println("t10 over"); } }; t1.start(); t2.start(); t3.start(); t4.start(); t5.start(); t6.start(); t7.start(); t8.start(); t9.start(); t10.start(); } public static void main(String[] args) { new TestLock(); } }

 

It is to start 10 threads, to a constant non-thread-safe content put in HashMap / get content, content put very simple, key growth and value are integers from 0 (this put from not doing well , so that later interfere with the analysis of the problem I thought). HashMap do for concurrent write, I just thought the situation will have dirty data, but repeatedly run this program, there will be a thread t1, t2 is hang live situation, in most cases is a thread hang live another successful conclusion occasionally hang 10 threads are live.

The root cause lies in an endless loop of a shared variable unprotected - an action "HashMap" data structure. When added to the "synchronized" in the method of operation after all, everything returned to normal. Jvm is this a bug in it? It should be said that no, this phenomenon long before the report came out. Sun's engineers do not think this is a bug, but recommended the use of "ConcurrentHashMap" In such a scenario,

CPU usage is usually because there has been the emergence of an infinite loop, resulting in some thread runs occupied cpu time. The reason is that the problem HashMap is not thread safe, when multiple threads put a key value caused endless loop Entry key List, the problem is so produced.

When another thread get the Entry List of key infinite loop when this will always get executed. The end result is a growing number of threads infinite loop, leading to server dang out. We generally believe that the value of the previous value of a HashMap repeated insertion time, will be covered by this right. But for multi-threaded access, because of its internal implementation mechanism (in a multithreaded environment and not for the synchronous case, do put operation on the same HashMap may result in two or more threads at the same time do rehash action, it could result in a loop key the table appears, will not terminate once the thread appears, continue to occupy CPU, resulting in high CPU utilization), on security issues may have occurred.

Stack dump information using tools jstack the problem that server. Infinite loop, it first looks RUNNABLE thread, find problems in your code as follows:

java.lang.Thread.State:RUNNABLE
at java.util.HashMap.get(HashMap.java:303)
at com.sohu.twap.service.logic.TransformTweeter.doTransformTweetT5(TransformTweeter.java:183)
共出现了23次。
java.lang.Thread.State:RUNNABLE
at java.util.HashMap.put(HashMap.java:374)
at com.sohu.twap.service.logic.TransformTweeter.transformT5(TransformTweeter.java:816)
共出现了3次。

 

Note: HashMap lead to irrational use is an infinite loop instead of a deadlock.

Multithreading put the time may result in loss of elements

The main problem lies in the method of addEntry new Entry (hash, key, value, e), if two threads simultaneously made e, then they are the next element e, then assigned to the table when there is a successful element of a lost.

After put non-null elements get out of it is null

In the transfer method, the following code:

void transfer(Entry[] newTable) {
    Entry[] src = table;
    int newCapacity = newTable.length;
    for (int j = 0; j < src.length; j++) {
        Entry e = src[j];
        if (e != null) { src[j] = null; do { Entry next = e.next; int i = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; } while (e != null); } } }

 

In this method, the old array assignment to src, traversing src, when a non-null src element, it will be the element is set to null src in the upcoming old array element is set to null, and that is this sentence:

if (e != null) { src[j] = null;

HashMap data structure

I need to say briefly HashMap this classic data structures.

HashMap usually use a pointer array (assumed to be table []) do disperse all the key, when a key is added, the index i will be calculated by the key array through Hash algorithm, and put this into the table [ i], if there are two different operators on the same key a i, so called conflict, also known as a collision, it will form a linked list [i on the table].

We know that if the table [] of size is small, say only two, if you want to put 10 keys, then the collision very frequently, so a O (1) search algorithm, it becomes a linked list traversal, performance change become O (n), which is a defect Hash table.

So, Hash table size and capacity is very important. Generally, this container Hash table when there is data to be inserted, there will check thredhold capacity exceeds the set, if more than is necessary to increase the size of the Hash table, but this way, the entire Hash table elements are required It was re-count again. This is called rehash, this cost is quite large.

The source code HashMap rehash

Below, we look at HashMap in Java source code. Put a Key, Value of the Hash table:

public V PUT (Key K, V value) 
{ 
    ...... 
    // calculated Hash value 
    int = the hash the hash (key.hashCode ()); 
    int I = indexFor (the hash, table.length); 
    // if this key has been inserted, replace the old value (link operation) 
    for (the Entry <K, V> Table E = [I]; E = null;! E = e.next) {  Object K;  IF (e.hash the hash && == ((K = e.key) == || Key key.equals (K))) {  V = oldValue e.Value; e.Value = value; e.recordAccess (the this ); return oldValue;} ModCount ++} ; // the key does not exist, a node need to increase the addEntry (the hash, key, value, I); return null ;}

 

Check the capacity is exceeded:

void the addEntry (int the hash, K Key, V value, int bucketIndex) 
{ 
    the Entry <K, V> E = Table [bucketIndex]; 
    Table [bucketIndex] = new new the Entry <K, V> (the hash, Key, value, E) ; 
    // Check if the current size exceeds the threshold we set the threshold, if exceeded, the need of a resize 
    IF (size ++> = threshold)  of a resize (2 * table.length);  }

 

Create a larger hash table, and then migrate data from the old to the new Hash tables Hash tables.

a resize void (int newCapacity) 
{ 
    the Entry [] = oldtable Table; 
    int = oldCapacity oldTable.length; 
    ...... 
    // create a new the Table the Hash 
    the Entry [] = new new NewTable the Entry [newCapacity];  // The Old Table hash data on migration to the New hash Table  Transfer (NewTable);  Table = NewTable; threshold = (int) (* newCapacity loadFactor);}

 

Migration of source code, highlighting note at:

void Transfer (the Entry [] NewTable) 
{ 
    the Entry [] the src = Table; 
    int = newCapacity newTable.length; 
    // The following code means: 
    // OldTable pick an element from coming out, and then placed in NewTable 
    for ( J 0 = int; J <src.length; J ++ ) {  the Entry <K, V> E = the src [J];  IF (E = null! ) {the src [J] = null ; do {the Entry <K, V> = Next e.next; I = int indexFor (e.hash, newCapacity); e.next = NewTable [I]; NewTable [I] = E; E = Next;} the while (! E = null );}}}

 

Well, this code is relatively normal. And there is no problem.

Normal course of ReHash

He drew a diagram made a presentation.

  1. I suppose our hash algorithm is simple with key mod about the size of the table (that is, the length of the array).

Top is old hash table, size wherein the Hash table = 2, the key = 3, 7, 5, mod 2 after all conflict table1 here.

The next three steps to resize Hash Table 4, then all re rehash process.

 

Concurrent process Rehash

(1) Suppose we have two threads. I marked it with a red and light blue. We look back at our transfer code this detail:

do { 
    the Entry <K, V> Next = e.next; // <- assumed here to perform a thread is scheduled to hang 
    int I = indexFor (e.hash, newCapacity); 
    e.next = NewTable [I ]; 
    NewTable [I] = E;  E = Next;  } the while (! E = null);

 

And we completed the implementation of two threads. So we have a following like this.

Note: Because the e Thread1 pointing key (3), the next point to the key (7), in which the two thread rehash, the list points to the thread after two recombination. We can see the list after the order is reversed.

(2) a thread is scheduled execution back.

  1. First execution newTalbe [i] = e.

Then e = next, resulting in the e pointing key (7).

And the next cycle of next = e.next led next pointing key (3).

(3) all is well.

Then thread a job. The key (7) off, into a first newTable [i], and e and the next down-shift.

(4) circular link appears.

e.next = newTable [i] resulting in key (3) .next pointing key (7). NOTE: At this point the key (7) .next already pointing key (3), circular linked list, so there.

So, when we call to a thread, when HashTable.get (11), the tragedy appeared --Infinite Loop.

 

Three solutions

Hashtable替换HashMap

Hashtable 是同步的,但由迭代器返回的 Iterator 和由所有 Hashtable 的“collection 视图方法”返回的 Collection 的 listIterator 方法都是快速失败的:在创建 Iterator 之后,如果从结构上对 Hashtable 进行修改,除非通过 Iterator 自身的移除或添加方法,否则在任何时间以任何方式对其进行修改,Iterator 都将抛出 ConcurrentModificationException。因此,面对并发的修改,Iterator 很快就会完全失败,而不冒在将来某个不确定的时间发生任意不确定行为的风险。由 Hashtable 的键和值方法返回的 Enumeration 不是快速失败的。

注意,迭代器的快速失败行为无法得到保证,因为一般来说,不可能对是否出现不同步并发修改做出任何硬性保证。快速失败迭代器会尽最大努力抛出 ConcurrentModificationException。因此,为提高这类迭代器的正确性而编写一个依赖于此异常的程序是错误做法:迭代器的快速失败行为应该仅用于检测程序错误。

Collections.synchronizedMap将HashMap包装起来

返回由指定映射支持的同步(线程安全的)映射。为了保证按顺序访问,必须通过返回的映射完成对底层映射的所有访问。在返回的映射或其任意 collection 视图上进行迭代时,强制用户手工在返回的映射上进行同步:

Map m = Collections.synchronizedMap(new HashMap());
...
Set s = m.keySet();  // Needn't be in synchronized block
...
synchronized(m) {  // Synchronizing on m, not s!
Iterator i = s.iterator(); // Must be in synchronized block
    while (i.hasNext())
        foo(i.next());
}

 

不遵从此建议将导致无法确定的行为。如果指定映射是可序列化的,则返回的映射也将是可序列化的。

ConcurrentHashMap替换HashMap

Hash table adjustable concurrency support full concurrency of retrievals and updates expected. Such functions comply with the same specifications Hashtable, and includes a method corresponding to each version of the method of the Hashtable. However, even though all operations are thread-safe, but do not have to lock the retrieval operation, and is not supported in some way to prevent all access to lock the entire table. Such a program can be fully interoperable with Hashtable, depending on its thread-safe, and synchronized with extraneous details.
Retrieval operations (including get) generally not block, so may overlap with update operations (including put and remove). Retrieving affect the results of the recently completed an update operation. For some aggregation operations, such as putAll and Clear, concurrent retrieval may affect only some of the insertion and removal of entries. Similarly, when you create after the iterator / enumeration or since, Iterators and Enumerations return elements affect the state of the hash table at some point in time. They do not throw a ConcurrentModificationException. However, each iteration is designed to be used only by a thread.

Reference: HashMap multithreading Analysis

Guess you like

Origin www.cnblogs.com/aspirant/p/11504389.html