Concurrent Containers and Frameworks - Concurrent Containers (1)

There are many concurrent containers and frameworks in Java, only some of them are introduced here.

1. Implementation principle and use of ConcurrentHashMap

1.1 Why use ConcurrentHashMap

Using HashMap in concurrent programming can lead to infinite loops in the program. The efficiency of using thread-safe HashTable is very low. Based on the above two reasons, there is ConcurrentHashMap.

Thread-unsafe HashMap: In a multi-threaded environment, using HashMap for put operation will cause an infinite loop of get operation, resulting in CPU utilization close to 100%, so HashMap cannot be used in concurrent situations. HashMap will cause an infinite loop when executing the put operation concurrently, because multithreading will cause the HashMap's Entry linked list to form a ring data structure. Once the ring data structure is formed, the next node of the Entry will never be empty, and an infinite loop will be generated to obtain the Entry.
Inefficient HashTable: The HashTable container uses synchronized to ensure thread safety, but the efficiency of HashTable is very low in the case of intense thread competition. Because when a thread accesses the synchronization method of HashTable, and other threads also access the synchronization method of HashTable, it will enter the blocking or polling state. For example, if thread 1 uses put to add elements, thread 2 cannot use the put method to add elements, nor can it use the get method to obtain elements, so the more intense the competition, the lower the efficiency.
Lock segmentation technology of ConcurrentHashMap: The reason why the HashTable container is inefficient in a highly competitive concurrent environment is that all threads accessing the HashTable must compete for the same lock. If there are multiple locks in the container, each lock is used for the lock Part of the data in the container, then when multiple threads access data in different data segments in the container, there will be no lock competition between threads, which can effectively improve the efficiency of concurrent access. This is the lock segmentation technology used by ConcurrentHashMap. First , the data is divided into sections and stored, and then a lock is assigned to each section of data. When a thread occupies the lock to access one of the sections of data, the data of other sections can also be accessed by other threads.

1.2 ConcurrentHashMap structure

ConcurrentHashMap is composed of Segment array structure and HashEntry array structure. Segment is a kind of reentrant lock (ReentrantLock), which plays the role of lock in ConcurrentHashMap ; HashEntry is used to store key-value pair data . A ConcurrentHashMap contains an array of Segments . The structure of Segment is similar to that of HashMap, which is an array and linked list structure. A Segment contains (it may be more appropriate to use "locked") a HashEntry array , each element in the HashEntry array is an element of a linked list structure , and each Segment guards an element in the HashEntry array, when the data of the HashEntry array is When a modification is made, the segment lock corresponding to it must first be acquired. That is to say, there are multiple Segment arrays in a ConcurrentHashMap, and each element in the Segment array is locked as a HashEntry array, and each array element in each HashEntry array stores a linked list.

1.3 Simple analysis of the access process of ConcurrentHashMap

put(K key, V value): When storing the key-value pair data, the first hash operation will be performed on the key value, and the position in the Segment array will be located. After the position is confirmed, all elements under the position will be added. Lock. After locking, perform the insertion operation, and perform a second hash operation on the pair of key values to confirm the subscript position stored in the HashEntry array, and to determine whether the HashEntry array needs to be expanded and recalculated in the HashEntry array. Storage of all elements. After determining the position in the HashEntry array, add the key-value pair to the linked list. The expansion judgment of Segment is more appropriate than HashMap, because HashMap judges whether the element has reached the capacity after inserting the element. If it reaches the capacity, it will expand, but it is very likely that no new elements will be inserted after the expansion. At this time, the HashMap is invalid. expansion.
get(Object key): After hashing once, then use this hash value to locate the element in the Segment array through hash operation, and then locate the element in the HashEntry array through the hash algorithm. The efficiency of get is that it does not need to lock, unless the value read is empty, it will be locked and reread. The get method of the HashTable container is locked, so its efficiency is extremely low. And how does ConcurrentHashMap achieve unlocking? Observing its source code, it can be found that all shared variables used in the get method are all set to volatile type, such as the count field used to count the current segment size and the value of HashEntry used to store values, all key-value pair elements, etc. Variables defined as volatile can maintain visibility between threads, can be read by multiple threads at the same time, and ensure that expired values will not be read, but can only be written by a single thread (there is one case that can be written by multiple threads, that is, write The input value does not depend on the original value), in the get operation, you only need to read and not write the shared variables count and value, so you don't need to lock. The reason why the expired value is not read is because according to the happen before principle of the Java memory model, the write operation to the volatile field is prior to the read operation. Even if two threads modify and obtain the volatile variable at the same time, the get operation can also obtain the volatile variable. The latest value, which is a classic use case for replacing locks with volatile.
size(): If you want to count the size of the elements in the entire ConcurrentHashMap, you must count the sizes of all the elements in the Segment and sum them up. The global variable count in a Segment is a volatile variable. In a multi-threaded scenario, can the size of the entire ConcurrentHashMap be obtained by directly adding the counts of all Segments? Actually not, although you can get the latest value of count for each segment, if the value of count changes when adding these values (atomic operation and its visibility) then you will get an inaccurate value. It would be very inefficient to lock all the put, remove and clean methods of all segments. Therefore, using the modCount variable, the variable modCount will be incremented by 1 before operating the elements in the put, remove and clean methods, then compare whether the modCount has changed before and after the size is counted, so as to know whether the size of the container has changed.

2. Implementation principle and use of ConcurrentLinkedQueue

ConcurrentLinkedQueue is an unbounded thread-safe queue based on linked nodes. It sorts nodes using a first-in, first-out rule. When we add an element, it will be added to the tail of the queue. When we get an element, it will return The element at the head of the queue. Based on CAS's "wait-free" (regular no-wait) implementation, CAS is not an algorithm, it is a hardware instruction directly supported by the CPU, which determines its platform dependency to a certain extent.

The currently commonly used multi-thread synchronization mechanisms can be divided into the following three types:

Volatile variables: lightweight multi-threaded synchronization mechanism that does not cause context switching and thread scheduling. Provides only memory visibility guarantees, not atomicity.
CAS atomic instruction: A lightweight multi-thread synchronization mechanism that does not cause context switching and thread scheduling. It provides both memory visibility and atomic update guarantees.
Mutex: A heavyweight multi-threaded synchronization mechanism that may cause context switching and thread scheduling, it provides both memory visibility and atomicity.

The non-blocking algorithm implementation of ConcurrentLinkedQueue can be summarized as follows:

The use of CAS atomic instructions to handle concurrent access to data is the basis for implementing non-blocking algorithms.
head/tail does not always point to the head/tail node of the queue, which means the queue is allowed to be in an inconsistent state. This feature separates the two steps that need to be atomically executed together when enqueuing/dequeuing, thereby narrowing the range of values that need to be atomically updated when enqueuing/dequeuing to a unique variable. This is the key to the realization of non-blocking algorithms.
Updating head/tail in batches reduces overall enqueue/dequeue overhead.

2.1 Data Structure

Its data structure is no different from that of an ordinary queue. When a single thread inserts or deletes, it is the same as an ordinary queue, but if a multi-threaded deletion and insertion operation is performed on an ordinary queue, the phenomenon of queue insertion will occur, and ConcurrentLinkedQueue avoids this. phenomenon occurs.

2.2 Entry queue implementation

The entire enqueue process mainly does two things: the first is to locate the tail node; the second is to use CAS to set the enqueue node as the next node of the tail node, if unsuccessful, try again.

 //实现源码
public boolean offer(E e) {
        checkNotNull(e);
        final Node<E> newNode = new Node<E>(e);

        for (Node<E> t = tail, p = t;;) {
            Node<E> q = p.next;
            if (q == null) {
                // p is last node
                if (p.casNext(null, newNode)) {
                    // Successful CAS is the linearization point
                    // for e to become an element of this queue,
                    // and for newNode to become "live".
                    if (p != t) // hop two nodes at a time
                        casTail(t, newNode);  // Failure is OK.
                    return true;
                }
                // Lost CAS race to another thread; re-read next
            }
            else if (p == q)
                // We have fallen off list.  If tail is unchanged, it
                // will also be off-list, in which case we need to
                // jump to head, from which all live nodes are always
                // reachable.  Else the new tail is a better bet.
                p = (t != (t = tail)) ? t : head;
            else
                // Check for tail updates after two hops.
                p = (p != t && t != (t = tail)) ? t : q;
        }
    }

Locate the tail node: Looking at the source code, you can find that the head in the queue represents the head node, and the tail represents the tail node. However, when multiple threads enter the queue, the tail node is not necessarily the tail node, and the tail node may be the next node of the tail. The first if in the loop body in the code is to determine whether the tail has a next node, and if so, it means that the next node may be a tail node. To obtain the next node of the tail node, it should be noted that the p node is equal to the next node of p. There is only one possibility that the p node and the next node of p are both empty, which means that the queue has just been initialized and is about to add nodes, so it needs to return head node.
Set the enqueue node as the tail node: the p.casNext(null, n) method is used to set the enqueue node as the next node of the current queue tail node, if p is null, it means p is the tail node of the current queue, if not null , indicating that another thread has updated the tail node, and the tail node of the current queue needs to be re-acquired.

2.3 Out-queue implementation

First get the element of the head node, and then judge whether the element of the head node is empty. If it is empty, it means that another thread has performed a dequeue operation to remove the element of the node. If it is not empty, use the CAS method to The reference of the head node is set to null. If the CAS is successful, the element of the head node will be returned directly. If it is unsuccessful, it means that another thread has performed a dequeue operation to update the head node, causing the element to change, and the head needs to be re-fetched. node.