Java Core Technology Interview Essentials (Lecture 10) | How to ensure that collections are thread-safe? How does ConcurrentHashMap achieve efficient thread-safety?

In the previous two lectures, I introduced the typical container classes of the Java collection framework. Most of them are not thread-safe. The only thread-safe implementations, such as Vector and Stack, are far from satisfactory in terms of performance. Fortunately, the Java language provides a concurrent package (java.util.concurrent), which provides a more comprehensive tool support for high concurrency requirements.

The question I want to ask you today is, how to ensure that the container is thread-safe? How does ConcurrentHashMap achieve efficient thread safety?

Typical answer

Java provides different levels of thread safety support. In the traditional collection framework, in addition to synchronized containers such as Hashtable, it also provides a so-called Synchronized Wrapper. We can call the packaging method provided by the Collections tool class to obtain a synchronized packaging container (such as Collections.synchronizedMap), But they all use very coarse-grained synchronization methods, and their performance is relatively low in the case of high concurrency.

In addition, a more common choice is to use the thread-safe container class provided by the concurrent package, which provides:

Various concurrent containers, such as ConcurrentHashMap, CopyOnWriteArrayList.
Various thread-safe queues (Queue/Deque), such as ArrayBlockingQueue, SynchronousQueue.
Thread-safe versions of various ordered containers, etc.

Specific ways to ensure thread safety include from simple synchronize methods to more refined ones, such as concurrent implementations such as ConcurrentHashMap based on separate locks. The specific choice depends on the development scenario requirements. In general, the container general scenario provided in the concurrent package is far better than the early simple synchronization implementation.

Test site analysis

When it comes to thread safety and concurrency, it can be said that it is a must-test point in the Java interview. The answer I gave above is a relatively broad summary, and the implementation of concurrent containers such as ConcurrentHashMap is also evolving and cannot be generalized.

If you want to think deeply and answer this question and its extensions, you need at least:

Understand basic thread safety tools.
Understand the problems of Map in concurrent programming of traditional collection framework, and be aware of the shortcomings of simple synchronization methods.
Sort out the concurrency package, especially what methods ConcurrentHashMap has taken to improve concurrency performance.
It is best to be able to grasp the evolution of ConcurrentHashMap itself, a lot of current analysis data is still based on its earlier version.

Today I will mainly continue the content of the previous two lectures in the column, focusing on the interpretation of HashMap and ConcurrentHashMap which are often examined at the same time. Today’s lecture is not a comprehensive review of concurrency. After all, this is not a column that can introduce a complete one. It is an appetizer. It is similar to CAS and other lower-level mechanisms. Later, I will discuss the topic of concurrency in the Java advanced module. There is a more systematic introduction.

Knowledge expansion

1. Why do I need ConcurrentHashMap?

Hashtable itself is relatively inefficient, because its implementation is basically to add "synchronized" to put, get, size and other methods. Simply put, this causes all concurrent operations to compete for the same lock. When a thread is performing a synchronization operation, other threads can only wait, which greatly reduces the efficiency of concurrent operations.

As mentioned earlier, HashMap is not thread-safe, and concurrency will cause problems such as 100% CPU usage, so can we use the synchronization wrapper provided by Collections to solve the problem?

Looking at the code snippet below, we find that the synchronization wrapper just uses the input Map to construct another synchronized version. Although all operations are no longer declared as synchronized methods, they still use "this" as a mutually exclusive mutex, which has no real meaning. improvement of!

private static class SynchronizedMap<K,V>
    implements Map<K,V>, Serializable {
    private final Map<K,V> m;     // Backing Map
    final Object      mutex;        // Object on which to synchronize
    // …
    public int size() {
        synchronized (mutex) {return m.size();}
    }
 // … 
}

Therefore, Hashtable or synchronous packaging versions are only suitable for non-highly concurrent scenarios.

2. ConcurrentHashMap analysis

Let's take a look at how ConcurrentHashMap is designed and implemented, and why it can greatly improve concurrency efficiency.

First of all, I emphasize here that the design and implementation of ConcurrentHashMap has actually been evolving. For example, there have been very big changes in Java 8 (Java 7 actually has a lot of updates), so I will compare the structure, implementation mechanism, etc. , Compare the main differences between different versions.

Early ConcurrentHashMap, its implementation is based on:

Separate the lock, that is, segment the interior, which is an array of HashEntry, which is similar to HashMap, and the same items with the same hash are also stored in the form of a linked list.
HashEntry uses the volatile value field internally to ensure visibility, and also uses the immutable object mechanism to improve the use of the underlying capabilities provided by Unsafe, such as volatile access, to directly complete some operations to optimize performance. After all, many operations in Unsafe They are all optimized by JVM intrinsic.

You can refer to the following diagram of the internal structure of the early ConcurrentHashMap. Its core is to use the segmentation design. When performing concurrent operations, only the corresponding segment needs to be locked. This effectively avoids the problem of the overall synchronization of the Hashtable and greatly improves the performance.

When constructing, the number of segments is determined by the so-called concurrentcyLevel, which is 16 by default, or it can be directly specified in the corresponding constructor. Note that Java requires it to be a power value of 2. If the input is a non-power value such as 15, it will be automatically adjusted to a power value of 2, such as 16.

For the specific situation, let's take a look at the source code of some basic Map operations . This is a relatively new get code for JDK 7. For the specific optimization part, in order to facilitate understanding, I directly commented in the code segment that the get operation needs to ensure visibility, so there is no synchronization logic.

public V get(Object key) {
        Segment<K,V> s; // manually integrate access methods to reduce overhead
        HashEntry<K,V>[] tab;
        int h = hash(key.hashCode());
       //利用位操作替换普通数学运算
       long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;
        // 以Segment为单位，进行定位
        // 利用Unsafe直接进行volatile access
        if ((s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)) != null &&
            (tab = s.table) != null) {
           //省略
          }
        return null;
    }

For the put operation, the first is to avoid hash conflicts through the second hash, and then use the Unsafe call method to directly obtain the corresponding segment, and then perform the thread-safe put operation:

public V put(K key, V value) {
        Segment<K,V> s;
        if (value == null)
            throw new NullPointerException();
        // 二次哈希，以保证数据的分散性，避免哈希冲突
        int hash = hash(key.hashCode());
        int j = (hash >>> segmentShift) & segmentMask;
        if ((s = (Segment<K,V>)UNSAFE.getObject          // nonvolatile; recheck
             (segments, (j << SSHIFT) + SBASE)) == null) //  in ensureSegment
            s = ensureSegment(j);
        return s.put(key, hash, value, false);
    }

The core logic is implemented in the following internal methods:

final V put(K key, int hash, V value, boolean onlyIfAbsent) {
            // scanAndLockForPut会去查找是否有key相同Node
            // 无论如何，确保获取锁
            HashEntry<K,V> node = tryLock() ? null :
                scanAndLockForPut(key, hash, value);
            V oldValue;
            try {
                HashEntry<K,V>[] tab = table;
                int index = (tab.length - 1) & hash;
                HashEntry<K,V> first = entryAt(tab, index);
                for (HashEntry<K,V> e = first;;) {
                    if (e != null) {
                        K k;
                        // 更新已有value...
                    }
                    else {
                        // 放置HashEntry到特定位置，如果超过阈值，进行rehash
                        // ...
                    }
                }
            } finally {
                unlock();
            }
            return oldValue;
        }

Therefore, it is clear from the above source code that when performing concurrent write operations:

ConcurrentHashMap will acquire a reentry lock to ensure data consistency. Segment itself is based on an extended implementation of ReentrantLock. Therefore, during concurrent modification, the corresponding segment is locked.
In the initial stage, a repetitive scan is performed to determine whether the corresponding key value is already in the array, and then to decide whether to update or place the operation, you can see the corresponding comment in the code. Repeated scanning and detecting conflicts are common techniques of ConcurrentHashMap.
When I introduced HashMap in the last column, I mentioned the possible expansion problem, which also exists in ConcurrentHashMap. However, there is an obvious difference, that is, it does not expand the overall capacity, but expands the segment separately. The details are not introduced.

Another Map's size method also requires attention, and its implementation involves a side effect of detached locks.

Just imagine, if you simply calculate the total value of all segments without synchronization, the results may be inaccurate due to concurrent puts, but directly locking all the segments for calculations will become very expensive. In fact, the separation lock also restricts Map initialization and other operations.

Therefore, the implementation of ConcurrentHashMap is to try to obtain a reliable value through the retry mechanism (RETRIES_BEFORE_LOCK, specify the number of retries 2). If no change is detected (by comparing Segment.modCount), it returns directly, otherwise the lock is acquired and the operation is performed.

Let me compare, in Java 8 and later versions, what changes have taken place in ConcurrentHashMap?

In terms of the overall structure, its internal storage has become very similar to the HashMap structure I introduced in the column. It is also a large bucket array, and then there is also a so-called linked list structure (bin) inside, with the granularity of synchronization. Be more detailed.
There is still a segment definition inside, but it is only to ensure compatibility during serialization, and no longer has any structural use.
Because segment is no longer used, the initialization operation is greatly simplified and modified to lazy-load form, which can effectively avoid the initial overhead and solve the complaint of many people in the old version. Data storage uses volatile to ensure visibility.
Use CAS and other operations to perform lock-free concurrent operations in specific scenarios.
Use low-level methods such as Unsafe and LongAdder to optimize extreme situations.

Looking at the current internal implementation of data storage, we can find that the Key is final, because it is impossible for the Key of an item to change during the life cycle; at the same time, val is declared as volatile to ensure visibility.

static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        volatile V val;
        volatile Node<K,V> next;
        // … 
    }

I won't introduce the get method and constructor here anymore, it's relatively simple, just look at how concurrent put is implemented.

final V putVal(K key, V value, boolean onlyIfAbsent) { if (key == null || value == null) throw new NullPointerException();
    int hash = spread(key.hashCode());
    int binCount = 0;
    for (Node<K,V>[] tab = table;;) {
        Node<K,V> f; int n, i, fh; K fk; V fv;
        if (tab == null || (n = tab.length) == 0)
            tab = initTable();
        else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
            // 利用CAS去进行无锁线程安全操作，如果bin是空的
            if (casTabAt(tab, i, null, new Node<K,V>(hash, key, value)))
                break; 
        }
        else if ((fh = f.hash) == MOVED)
            tab = helpTransfer(tab, f);
        else if (onlyIfAbsent // 不加锁，进行检查
                 && fh == hash
                 && ((fk = f.key) == key || (fk != null && key.equals(fk)))
                 && (fv = f.val) != null)
            return fv;
        else {
            V oldVal = null;
            synchronized (f) {
                   // 细粒度的同步修改操作... 
                }
            }
            // Bin超过阈值，进行树化
            if (binCount != 0) {
                if (binCount >= TREEIFY_THRESHOLD)
                    treeifyBin(tab, i);
                if (oldVal != null)
                    return oldVal;
                break;
            }
        }
    }
    addCount(1L, binCount);
    return null;
}

The initialization operation is implemented in initTable, which is a typical CAS usage scenario, using volatile sizeCtl as a mutually exclusive means: if a competitive initialization is found, spin there and wait for the condition to recover; otherwise, use CAS to set the exclusive flag. If successful, initialize; otherwise, try again.

Please refer to the following code:

private final Node<K,V>[] initTable() {
    Node<K,V>[] tab; int sc;
    while ((tab = table) == null || tab.length == 0) {
        // 如果发现冲突，进行spin等待
        if ((sc = sizeCtl) < 0)
            Thread.yield(); 
        // CAS成功返回true，则进入真正的初始化逻辑
        else if (U.compareAndSetInt(this, SIZECTL, sc, -1)) {
            try {
                if ((tab = table) == null || tab.length == 0) {
                    int n = (sc > 0) ? sc : DEFAULT_CAPACITY;
                    @SuppressWarnings("unchecked")
                    Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n];
                    table = tab = nt;
                    sc = n - (n >>> 2);
                }
            } finally {
                sizeCtl = sc;
            }
            break;
        }
    }
    return tab;
}

When the bin is empty, there is no need to lock it, and it is placed by CAS operation.

Have you noticed that in terms of synchronization logic, it uses synchronized instead of the commonly recommended ReentrantLock. Why? In modern JDK, synchronized has been continuously optimized, so you can no longer worry too much about performance differences. In addition, compared to ReentrantLock, it can reduce memory consumption, which is a very big advantage.

At the same time, more detailed implementations are optimized through the use of Unsafe. For example, tabAt uses getObjectAcquire directly to avoid the overhead of indirect calls.

static final <K,V> Node<K,V> tabAt(Node<K,V>[] tab, int i) {
    return (Node<K,V>)U.getObjectAcquire(tab, ((long)i << ASHIFT) + ABASE);
}

Let's take a look at how the size operation is implemented now. Reading the code, you will find that the real logic is in the sumCount method, so what does sumCount do?

final long sumCount() {
    CounterCell[] as = counterCells; CounterCell a;
    long sum = baseCount;
    if (as != null) {
        for (int i = 0; i < as.length; ++i) {
            if ((a = as[i]) != null)
                sum += a.value;
        }
    }
    return sum;
}

We found that although the idea is still similar to before, it is divided and conquered for counting and then summing, but the implementation is based on a strange CounterCell. Is its value more accurate? How is data consistency guaranteed?

static final class CounterCell {
    volatile long value;
    CounterCell(long x) { value = x; }
}

In fact, the operation of CounterCell is based on java.util.concurrent.atomic.LongAdder, which is a way for JVM to use space in exchange for higher efficiency, using the complex logic inside Striped64. This thing is very niche. In most cases, it is recommended to use AtomicLong, which is sufficient to meet the performance requirements of most applications.

Today I started with thread safety issues, conceptually summarized the basic container tools, analyzed the early synchronization of containers, and then analyzed how ConcurrentHashMap is designed and implemented in Java 7 and Java 8. I hope that ConcurrentHashMap's concurrency skills are useful for you in your daily life. Development can help

Practice one lesson

Do you know what we are discussing today? Leave a question for you. In the product code, is there a typical scenario where a concurrent container like ConcurrentHashMap needs to be used?