Please read first:
HashMap source code analysis
Hashtable class annotation translation, source code analysis
I. Introduction
Let 's review first HashMap
. It is based on a hash table. Each element is a pair, and the conflict problem is solved internally through a singly linked list. When the capacity is insufficient (exceeds the threshold), it will also automatically increase. The data structure can be represented as follows:HashTable
HashMap
key-value
HashTable
It is HashMap
a thread-safe version, but it is used synchronized
to ensure thread safety. When the thread competition is fierce, the efficiency is very low. When one thread accesses HashTable
the synchronization method, when other threads access HashTable
the synchronization method, it may enter the blocking or polling state . The reason is that all threads accessing the HashTable must compete for the same lock.
The data structure can be represented as follows:
Then, something more efficient ConcurrentHashMap
comes along. ConcurrentHashMap
The idea is that each lock is used to lock part of the data in the container, so when multiple threads access data in different data segments in the container, there will be no lock competition between threads, which can effectively improve the efficiency of concurrent access.
The data structure diagram can be represented as follows:
2. Source code analysis
1. ConcurrentHashMap
The structure of
ConcurrentHashMap
It consists of Segment
an array structure and HashEntry
an array structure. Segment
It is a reentrant lock ReentrantLock
that ConcurrentHashMap
acts as a lock and HashEntry
is used to store key-value pair data. One ConcurrentHashMap
contains an Segment
array, Segment
the structure HashMap
is similar, it is an array and linked list structure, one Segment
contains an HashEntry
array, each HashEntry
is an element of a linked list structure, each Segment
guardian is an HashEntry
element in the array, when HashEntry
the data of the array is processed. When modifying, it must first acquire its corresponding Segment
lock.
A class diagram can be represented as follows:
2. Constructor
It can be seen from the constructor:
- ssize
indicates the Segment
length of the array
- the length of the array in the cap
representation - : the filling rate of the array, used for the expansion of the array - , : The main function is to locateSegment
HashEntry
cap * loadFactor
hashEntry
HashEntry
segmentShift
segmentMask
Segment
Then create segments
the array and initialize the first one Segment
, the rest are Segment
lazy initialized. (Some of these details are not carefully studied)
@SuppressWarnings("unchecked")
public ConcurrentHashMap(int initialCapacity,
float loadFactor, int concurrencyLevel) {
if (!(loadFactor > 0) || initialCapacity < 0 || concurrencyLevel <= 0)
throw new IllegalArgumentException();
if (concurrencyLevel > MAX_SEGMENTS)
concurrencyLevel = MAX_SEGMENTS;
// Find power-of-two sizes best matching arguments
int sshift = 0;
int ssize = 1;
while (ssize < concurrencyLevel) {
++sshift;
ssize <<= 1;
}
this.segmentShift = 32 - sshift;
this.segmentMask = ssize - 1;
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
int c = initialCapacity / ssize;
if (c * ssize < initialCapacity)
++c;
int cap = MIN_SEGMENT_TABLE_CAPACITY;
while (cap < c)
cap <<= 1;
// create segments and segments[0]
Segment<K,V> s0 =
new Segment<K,V>(loadFactor, (int)(cap * loadFactor),
(HashEntry<K,V>[])new HashEntry[cap]);
Segment<K,V>[] ss = (Segment<K,V>[])new Segment[ssize];
UNSAFE.putOrderedObject(ss, SBASE, s0); // ordered write of segments[0]
this.segments = ss;
}
3. put()
Method
put
The process of the method does not look at the code first, think for yourself, it should be in this order:
① Calculate the hash
value
② Calculate the Segment
corresponding position of the array
③ Obtain the segment lock
④ Calculate the hashEntry
corresponding position of the array
⑤ Determine whether the hashEntry
array needs to be expanded
⑥ Insert the hashEntry
array or linked list
⑦ Release the segment lock
Take a look at the code below:
public V put(K key, V value) {
Segment<K,V> s;
if (value == null)
throw new NullPointerException();
int hash = hash(key);
int j = (hash >>> segmentShift) & segmentMask;
if ((s = (Segment<K,V>)UNSAFE.getObject // nonvolatile; recheck
(segments, (j << SSHIFT) + SBASE)) == null) // in ensureSegment
s = ensureSegment(j);
return s.put(key, hash, value, false);
}
final V put(K key, int hash, V value, boolean onlyIfAbsent) {
HashEntry<K,V> node = tryLock() ? null :
scanAndLockForPut(key, hash, value);
V oldValue;
try {
HashEntry<K,V>[] tab = table;
int index = (tab.length - 1) & hash;
HashEntry<K,V> first = entryAt(tab, index);
for (HashEntry<K,V> e = first;;) {
if (e != null) {
K k;
if ((k = e.key) == key ||
(e.hash == hash && key.equals(k))) {
oldValue = e.value;
if (!onlyIfAbsent) {
e.value = value;
++modCount;
}
break;
}
e = e.next;
}
else {
if (node != null)
node.setNext(first);
else
node = new HashEntry<K,V>(hash, key, value, first);
int c = count + 1;
if (c > threshold && tab.length < MAXIMUM_CAPACITY)
rehash(node);
else
setEntryAt(tab, index, node);
++modCount;
count = c;
oldValue = null;
break;
}
}
} finally {
unlock();
}
return oldValue;
}
put
The flow of the method is basically consistent with what was expected. The method to call after getting the Segment
corresponding position of the array . method is similar .Segment
put
Segment
put
HashMap
First try to acquire the lock, if not acquired, it will spin to a certain extent (if it exceeds a certain number of times, the current thread will be blocked), this process is scanAndLockForPut()
completed in the method; if the lock is acquired, the calculation needs to be added. Which linked list is located, and get the first node of the linked list, that is, the first
variable, and then traverse the linked list. If the new node key
already exists in the linked list, replace the value
value, add 1 to the modification times modCount
, and release the lock;
If the newly added node is key
not in the linked list, it is judged whether ConcurrentHashMap
the number of stored elements is greater than the threshold
threshold and less than the maximum capacity, and if so, the capacity is expanded; otherwise, the new node is inserted into the head of the linked list to complete the put
operation.
doubt
第1行 -- >HashEntry<K,V>[] tab = table;
第2行 --->int index = (tab.length - 1) & hash;
第3行 --->HashEntry<K,V> first = entryAt(tab, index);
For the code above: why not just use the
table
variable directly?According to some data, because it
table
is avolatile
variable, it will consume more resources (write: write to main memory immediately; read: get it from main memory).However, how to ensure the consistency of the
tab
variables and variables used in lines 2 and 3 ?table
4. rehash()
Method
private void rehash(HashEntry<K,V> node) {
HashEntry<K,V>[] oldTable = table;
int oldCapacity = oldTable.length;
int newCapacity = oldCapacity << 1;
threshold = (int)(newCapacity * loadFactor);
HashEntry<K,V>[] newTable =
(HashEntry<K,V>[]) new HashEntry[newCapacity];
int sizeMask = newCapacity - 1;
for (int i = 0; i < oldCapacity ; i++) {
HashEntry<K,V> e = oldTable[i];
if (e != null) {
HashEntry<K,V> next = e.next;
int idx = e.hash & sizeMask;
if (next == null) // Single node on list
newTable[idx] = e;
else { // Reuse consecutive sequence at same slot
HashEntry<K,V> lastRun = e;
int lastIdx = idx;
for (HashEntry<K,V> last = next;
last != null;
last = last.next) {
int k = last.hash & sizeMask;
if (k != lastIdx) {
lastIdx = k;
lastRun = last;
}
}
newTable[lastIdx] = lastRun;
// Clone remaining nodes
for (HashEntry<K,V> p = e; p != lastRun; p = p.next) {
V v = p.value;
int h = p.hash;
int k = h & sizeMask;
HashEntry<K,V> n = newTable[k];
newTable[k] = new HashEntry<K,V>(h, p.key, v, n);
}
}
}
}
int nodeIndex = node.hash & sizeMask; // add the new node
node.setNext(newTable[nodeIndex]);
newTable[nodeIndex] = node;
table = newTable;
}
This is the ConcurrentHashMap
expansion method, which HashEntry
doubles the size of the array oldCapacity
and moves the data from the old array to the new array newCapacity
.
We know that what oldCapacity
is stored in the array is the head node of each linked list. After the array is enlarged, each node needs to recalculate the position. If the linked list has only one node, it is directly put into the new array; for the linked list with multiple nodes, ConcurrentHashMap
the processing method is special. Take a specific linked list as an example:
After calculation, the index positions of the new array of nodes on a certain linked list are: 3, 4, 3, 3. Needless to say, the first two are added to the third HashEntry
linked list and the fourth HashEntry
linked list respectively. But the latter two belong to the same linked list, so just add the third one directly.
4. size()
Method
public int size() {
// Try a few times to get accurate count. On failure due to
// continuous async changes in table, resort to locking.
final Segment<K,V>[] segments = this.segments;
int size;
boolean overflow; // true if size overflows 32 bits
long sum; // sum of modCounts
long last = 0L; // previous sum
int retries = -1; // first iteration isn't retry
try {
for (;;) {
if (retries++ == RETRIES_BEFORE_LOCK) {
for (int j = 0; j < segments.length; ++j)
ensureSegment(j).lock(); // force creation
}
sum = 0L;
size = 0;
overflow = false;
for (int j = 0; j < segments.length; ++j) {
Segment<K,V> seg = segmentAt(segments, j);
if (seg != null) {
sum += seg.modCount;
int c = seg.count;
if (c < 0 || (size += c) < 0)
overflow = true;
}
}
if (sum == last)
break;
last = sum;
}
} finally {
if (retries > RETRIES_BEFORE_LOCK) {
for (int j = 0; j < segments.length; ++j)
segmentAt(segments, j).unlock();
}
}
return overflow ? Integer.MAX_VALUE : size;
}
The calculated ConcurrentHashMap
element size is an interesting problem, because it operates concurrently, that is, when you calculate the size, it is still inserting data concurrently, which may cause the calculated size to be different from your actual size (in When you return size
insert multiple data), to solve this problem, JDK1.7 version uses two schemes.
- In the first scheme, he will use the unlocked mode to try
ConcurrentHashMap
the size calculated multiple times, up to three times, and compare the results of the two calculations before and after. - The second scheme is that if the first scheme does not meet, he will
Segment
add a lock to each, and then returnConcurrentHashMap
the calculationsize
.
References:
Doug Lea: "Java Concurrent Programming Practice"
Fang Tengfei: "The Art of Java Concurrent Programming"