Business development process, in fact, the process of user traffic data, and thus the development of the core task is to maintain data consistency without error. Real scene, a plurality of users concurrent read and write the same data (e.g., spike), uncontrolled be overturned, add concurrency control is reduced, impact performance and user experience.
How elegant concurrent data control it? The nature of the need to address two issues:
- Read - write collision
- Write - write conflicts
Let's look at Java concurrent containers CopyOnWriteList classic ConcurrentHashMap and how to coordinate these two issues
CopyOnWriteList
Literacy strategy
CopyOnWrite name suggests copy-on-write strategy
After the writing process for the first plus ReentrantLock lock, and then copy the copy of the data, make changes to the copy, and then replaced with a copy of the data reference data, after the completion of the lock is released
For the read process, dependent on volatile semantics guarantee provided, each read can read the latest array reference
Read - write collision
Obviously, CopyOnWriteList using separate read and write ideas to resolve the conflict of concurrent read and write
When read and write operations occur simultaneously:
- If the write operation is not completed reference is replaced, then read process is original array written copy of the array processing operation, without disturbing each other
- If the write operation has been completed replace the reference, then read and write operation processing are references to the same array
Visible in the design of separate read and write, concurrent read and write process, the reading may not be able to see the latest real-time data, the so-called weak consistency.
It is due to the expense of strong consistency, allowing the read operation of the lock-free support high concurrent read
Write - write conflicts
When multiple simultaneous write operation, the first lock to get the first execution, other threads can only be blocked until the lock is released
Simple and crude but effective, but relatively poor concurrency
ConcurrentHashMap(JDK7)
Literacy strategy
The main idea of using segmented lock and reduce the probability of a simultaneous operation of data
For read operations:
- Positioned in the first array Segment using UNSAFE.getObjectVolatile atomic read semantics acquired Segment
- HashEntry then positioned in the array and utilizing UNSAFE.getObjectVolatile atomic read semantics acquired HashEntry
- Then relies on final constant traverse the list next pointer
- Find the corresponding volatile value
For write operations:
- Positioned in the first array Segment using UNSAFE.getObjectVolatile atomic read semantics acquired Segment
- And then try to lock ReentrantLock
- HashEntry then positioned in the array and utilizing UNSAFE.getObjectVolatile atomic read semantics head node list acquiring HashEntry
- Traverse the list, if the existing key is found, use UNSAFE.putOrderedObject atomic write the new value, if not found, create a new node is inserted into the head of the list, while taking advantage of UNSAFE.putOrderedObject atomic update the list head
- When they are done to release the lock
Read - write collision
If the data is not concurrent read and write in the same Segment, operating independent of each other
If in the same Segment, ConcurrentHashMap use a lot of reading and writing Java features to resolve the conflict, making a lot of read operations are no lock
When read and write operations occur simultaneously:
- If the PUT KEY already exists directly update the original value, this time a read operation can read the latest value in ensuring the volatile without having to lock
- When adding a node, or delete a node, it will change the original list structure, noting HashEntry each next PUT pointers are final, if the key does not exist, and therefore have to copy the list, update HashEntry array elements (ie list head node), when they are semantic guarantee provided by UNSAFE to complete the update, if the read operation takes place before the new update list, this time still get the original list, no lock, but the data is not current
Visible, support lock-free concurrent read or weakly consistent
Write - write conflicts
If concurrent data write operation is not in the same Segment, operating independent of each other
If in the same Segment, plus multiple threads or because ReentrantLock lock cause blocking wait
ConcurrentHashMap(JDK8)
Literacy strategy
Compared with the JDK7, less Segment lock segment this layer, directly operated Node array (list head array), hereinafter referred to as bucket
For read operations, by UNSAFE.getObjectVolatile read semantics for the latest atomic value
For write operations, the use of lazy way to load, only to determine the number of buckets just initialization, and no initial default. When the need to put the value of the positioning index, and the index of the next bucket whether the value is null, if so, by UNSAFE.comepareAndSwapObject (CAS) assignment, if not null, then add Synchronized lock, to find the corresponding link / red-black tree node value changes, after the lock is released
Read - write collision
If the data is not concurrent read and write in the same bucket, the independent non-interfering
If in the same barrel, compared with the version of JDK7, a lot easier, but still many features that make Java-based non-locking of read operations
When read and write operations occur simultaneously:
- If the PUT's key already exists, then update the value, then you can get the latest read value in the volatile guarantee
- If the PUT key does not exist, create a new node or to delete a node when the change of the original structure, when the volatile next pointer is inserted directly into the end of the list (red-black tree becomes more than a certain length), etc. to modify the structure, in which case you can also get to read the latest next
So long as the write operation happens-before read, volatile semantics can guarantee read data is current, it can be said JDK8 version of ConcurrentHashMap is consistent with the strong ( here only concerned with basic literacy (GET / PUT), there may be weak the same scene omissions, such as expansion of the operation, but should be globally locked, please point out any errors, learn together )
Write - write conflicts
If the data is not concurrent read and write in the same bucket, the independent non-interfering
If in the same bucket, noting writes adopt different strategies in different scenarios, CAS or Synchronized
When multiple write operations occur simultaneously, if the bucket is null, the CAS to deal with concurrent write, a write operation when the first assignment is successful, the thread behind the write CAS fails in competition Synchronized lock, blocking wait
summary
Why such a design (personal opinion)
Storing design data necessarily involves the data structure, the data had any operation based on the data structure
General idea is to lock the entire data structure, but there are locks will greatly affect the performance, so the next task, which can no lock of the operation is to find
Operations are divided into two categories, read and write.
Look at the writing, because it involves the alteration of the original data, uncontrolled certainly be overturned, how to control it?
Write operation can be divided into two types, one can change the structure, one that does not
For the structural changes will write, regardless of the underlying array or a linked list, due to the changes to the original structure was based necessarily have to lock serialization guarantee atomic operations, optimized point is the lock level is optimized, such as the beginning and other synchronized HashTable lock to lock ReentrantLock ConcurrentHashMap1.7 version, to version 1.8 of the Synchronized improved lock. Or data decentralization, concurrnethashmap data structures such as hash-based data structure than CopyOnWriteList on the multi-barrel dispersed advantage
For not change the structure of the writing, or change the frequency of small (barrel expansion low frequency), due to the lock overhead is too great, CAS is a good idea. Why not CopyOnWriteList CAS to control concurrent write, I personally think the main reason is because of frequent structural changes, can look ActomicReferenceArray container such as array-based CAS, are not allowed to create after the change in the structure.
After error correction to ensure data is not read is relatively easy to handle
The main consideration is not to read the latest real-time data (wait for the write operation to complete), which is consistent with the strong or weak the same problem
Strong consistency, then, have to wait to read the write completion, reading and writing competition the same lock, which affects the efficiency of each read and write.
Under most scenarios, read the data consistency demanding requirements did not write, can be read wrong, but determined not wrong. If at this moment to read the data has not changed completely, read the old data does not matter, as long as the final finish can read visible
Fortunately, JMM (Java Memory Model) has a volatile visibility semantics, we can ensure that the case is not locked, read or write data to see change. There are also various UNSAFE packet direct memory operations may be completed relatively high performance visibility Semantic
For read operations, the best data is the same data, do not worry about the problems caused by modifications. The only constant is change, some data may still change, if you want to support this invariance, or to minimize the frequency of change, you have to change the part of the process in other places, the so-called separate read and write
More than pure personal understanding, limited by the level, the idea is not necessarily correct, welcomed the discussion pointing
Recommended Reading
CopyOnWriteArrayList concurrent containers
ConcurrentHashMap interpretation of the principle and source code
Advanced Java (six) see Java multithreaded core technology from the evolution of ConcurrentHashMap