Concurrent read and write data consistency guarantees (a) Java concurrent containers

Business development process, in fact, the process of user traffic data, and thus the development of the core task is to maintain data consistency without error. Real scene, a plurality of users concurrent read and write the same data (e.g., spike), uncontrolled be overturned, add concurrency control is reduced, impact performance and user experience.

How elegant concurrent data control it? The nature of the need to address two issues:

Read - write collision

Write - write conflicts

Let's look at Java concurrent containers CopyOnWriteList classic ConcurrentHashMap and how to coordinate these two issues

CopyOnWriteList

Literacy strategy

CopyOnWrite name suggests copy-on-write strategy

After the writing process for the first plus ReentrantLock lock, and then copy the copy of the data, make changes to the copy, and then replaced with a copy of the data reference data, after the completion of the lock is released

For the read process, dependent on volatile semantics guarantee provided, each read can read the latest array reference

Read - write collision

Obviously, CopyOnWriteList using separate read and write ideas to resolve the conflict of concurrent read and write

When read and write operations occur simultaneously:

If the write operation is not completed reference is replaced, then read process is original array written copy of the array processing operation, without disturbing each other
If the write operation has been completed replace the reference, then read and write operation processing are references to the same array

Visible in the design of separate read and write, concurrent read and write process, the reading may not be able to see the latest real-time data, the so-called weak consistency.

It is due to the expense of strong consistency, allowing the read operation of the lock-free support high concurrent read

Write - write conflicts

When multiple simultaneous write operation, the first lock to get the first execution, other threads can only be blocked until the lock is released

Simple and crude but effective, but relatively poor concurrency

ConcurrentHashMap（JDK7）

Literacy strategy

The main idea of using segmented lock and reduce the probability of a simultaneous operation of data

For read operations:

Positioned in the first array Segment using UNSAFE.getObjectVolatile atomic read semantics acquired Segment
HashEntry then positioned in the array and utilizing UNSAFE.getObjectVolatile atomic read semantics acquired HashEntry
Then relies on final constant traverse the list next pointer
Find the corresponding volatile value

For write operations:

Positioned in the first array Segment using UNSAFE.getObjectVolatile atomic read semantics acquired Segment
And then try to lock ReentrantLock
HashEntry then positioned in the array and utilizing UNSAFE.getObjectVolatile atomic read semantics head node list acquiring HashEntry
Traverse the list, if the existing key is found, use UNSAFE.putOrderedObject atomic write the new value, if not found, create a new node is inserted into the head of the list, while taking advantage of UNSAFE.putOrderedObject atomic update the list head
When they are done to release the lock

Read - write collision

If the data is not concurrent read and write in the same Segment, operating independent of each other

If in the same Segment, ConcurrentHashMap use a lot of reading and writing Java features to resolve the conflict, making a lot of read operations are no lock

When read and write operations occur simultaneously:

If the PUT KEY already exists directly update the original value, this time a read operation can read the latest value in ensuring the volatile without having to lock
When adding a node, or delete a node, it will change the original list structure, noting HashEntry each next PUT pointers are final, if the key does not exist, and therefore have to copy the list, update HashEntry array elements (ie list head node), when they are semantic guarantee provided by UNSAFE to complete the update, if the read operation takes place before the new update list, this time still get the original list, no lock, but the data is not current

Visible, support lock-free concurrent read or weakly consistent

Write - write conflicts

If concurrent data write operation is not in the same Segment, operating independent of each other

If in the same Segment, plus multiple threads or because ReentrantLock lock cause blocking wait

ConcurrentHashMap（JDK8）

Literacy strategy

Compared with the JDK7, less Segment lock segment this layer, directly operated Node array (list head array), hereinafter referred to as bucket

For read operations, by UNSAFE.getObjectVolatile read semantics for the latest atomic value

For write operations, the use of lazy way to load, only to determine the number of buckets just initialization, and no initial default. When the need to put the value of the positioning index, and the index of the next bucket whether the value is null, if so, by UNSAFE.comepareAndSwapObject (CAS) assignment, if not null, then add Synchronized lock, to find the corresponding link / red-black tree node value changes, after the lock is released

Read - write collision

If the data is not concurrent read and write in the same bucket, the independent non-interfering

If in the same barrel, compared with the version of JDK7, a lot easier, but still many features that make Java-based non-locking of read operations

When read and write operations occur simultaneously:

If the PUT's key already exists, then update the value, then you can get the latest read value in the volatile guarantee
If the PUT key does not exist, create a new node or to delete a node when the change of the original structure, when the volatile next pointer is inserted directly into the end of the list (red-black tree becomes more than a certain length), etc. to modify the structure, in which case you can also get to read the latest next

So long as the write operation happens-before read, volatile semantics can guarantee read data is current, it can be said JDK8 version of ConcurrentHashMap is consistent with the strong ( here only concerned with basic literacy (GET / PUT), there may be weak the same scene omissions, such as expansion of the operation, but should be globally locked, please point out any errors, learn together )

Write - write conflicts

If the data is not concurrent read and write in the same bucket, the independent non-interfering

If in the same bucket, noting writes adopt different strategies in different scenarios, CAS or Synchronized

When multiple write operations occur simultaneously, if the bucket is null, the CAS to deal with concurrent write, a write operation when the first assignment is successful, the thread behind the write CAS fails in competition Synchronized lock, blocking wait

summary

Why such a design (personal opinion)

Storing design data necessarily involves the data structure, the data had any operation based on the data structure

General idea is to lock the entire data structure, but there are locks will greatly affect the performance, so the next task, which can no lock of the operation is to find

Operations are divided into two categories, read and write.

Look at the writing, because it involves the alteration of the original data, uncontrolled certainly be overturned, how to control it?

Write operation can be divided into two types, one can change the structure, one that does not

For the structural changes will write, regardless of the underlying array or a linked list, due to the changes to the original structure was based necessarily have to lock serialization guarantee atomic operations, optimized point is the lock level is optimized, such as the beginning and other synchronized HashTable lock to lock ReentrantLock ConcurrentHashMap1.7 version, to version 1.8 of the Synchronized improved lock. Or data decentralization, concurrnethashmap data structures such as hash-based data structure than CopyOnWriteList on the multi-barrel dispersed advantage

For not change the structure of the writing, or change the frequency of small (barrel expansion low frequency), due to the lock overhead is too great, CAS is a good idea. Why not CopyOnWriteList CAS to control concurrent write, I personally think the main reason is because of frequent structural changes, can look ActomicReferenceArray container such as array-based CAS, are not allowed to create after the change in the structure.

After error correction to ensure data is not read is relatively easy to handle

The main consideration is not to read the latest real-time data (wait for the write operation to complete), which is consistent with the strong or weak the same problem

Strong consistency, then, have to wait to read the write completion, reading and writing competition the same lock, which affects the efficiency of each read and write.

Under most scenarios, read the data consistency demanding requirements did not write, can be read wrong, but determined not wrong. If at this moment to read the data has not changed completely, read the old data does not matter, as long as the final finish can read visible

Fortunately, JMM (Java Memory Model) has a volatile visibility semantics, we can ensure that the case is not locked, read or write data to see change. There are also various UNSAFE packet direct memory operations may be completed relatively high performance visibility Semantic

For read operations, the best data is the same data, do not worry about the problems caused by modifications. The only constant is change, some data may still change, if you want to support this invariance, or to minimize the frequency of change, you have to change the part of the process in other places, the so-called separate read and write

More than pure personal understanding, limited by the level, the idea is not necessarily correct, welcomed the discussion pointing