ConcurrentHashMap source code understanding (1.7)

Please read first:
HashMap source code analysis
Hashtable class annotation translation, source code analysis

I. Introduction

Let 's review first HashMap. It is based on a hash table. Each element is a pair, and the conflict problem is solved internally through a singly linked list. When the capacity is insufficient (exceeds the threshold), it will also automatically increase. The data structure can be represented as follows:HashTable
HashMapkey-value

write picture description here

HashTableIt is HashMapa thread-safe version, but it is used synchronizedto ensure thread safety. When the thread competition is fierce, the efficiency is very low. When one thread accesses HashTablethe synchronization method, when other threads access HashTablethe synchronization method, it may enter the blocking or polling state . The reason is that all threads accessing the HashTable must compete for the same lock.
The data structure can be represented as follows:
write picture description here

Then, something more efficient ConcurrentHashMapcomes along. ConcurrentHashMapThe idea is that each lock is used to lock part of the data in the container, so when multiple threads access data in different data segments in the container, there will be no lock competition between threads, which can effectively improve the efficiency of concurrent access.
The data structure diagram can be represented as follows:
write picture description here

2. Source code analysis

1. ConcurrentHashMapThe structure of

ConcurrentHashMapIt consists of Segmentan array structure and HashEntryan array structure. SegmentIt is a reentrant lock ReentrantLockthat ConcurrentHashMapacts as a lock and HashEntryis used to store key-value pair data. One ConcurrentHashMapcontains an Segmentarray, Segmentthe structure HashMapis similar, it is an array and linked list structure, one Segmentcontains an HashEntryarray, each HashEntryis an element of a linked list structure, each Segmentguardian is an HashEntryelement in the array, when HashEntrythe data of the array is processed. When modifying, it must first acquire its corresponding Segmentlock.
A class diagram can be represented as follows:
write picture description here

2. Constructor

It can be seen from the constructor:
- ssizeindicates the Segmentlength of the array
- the length of the array in the caprepresentation - : the filling rate of the array, used for the expansion of the array - , : The main function is to locateSegmentHashEntry
cap * loadFactorhashEntryHashEntry
segmentShiftsegmentMaskSegment

Then create segmentsthe array and initialize the first one Segment, the rest are Segmentlazy initialized. (Some of these details are not carefully studied)

@SuppressWarnings("unchecked")
    public ConcurrentHashMap(int initialCapacity,
                             float loadFactor, int concurrencyLevel) {
        if (!(loadFactor > 0) || initialCapacity < 0 || concurrencyLevel <= 0)
            throw new IllegalArgumentException();
        if (concurrencyLevel > MAX_SEGMENTS)
            concurrencyLevel = MAX_SEGMENTS;
        // Find power-of-two sizes best matching arguments
        int sshift = 0;
        int ssize = 1;
        while (ssize < concurrencyLevel) {
            ++sshift;
            ssize <<= 1;
        }
        this.segmentShift = 32 - sshift;
        this.segmentMask = ssize - 1;
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        int c = initialCapacity / ssize;
        if (c * ssize < initialCapacity)
            ++c;
        int cap = MIN_SEGMENT_TABLE_CAPACITY;
        while (cap < c)
            cap <<= 1;
        // create segments and segments[0]
        Segment<K,V> s0 =
            new Segment<K,V>(loadFactor, (int)(cap * loadFactor),
                             (HashEntry<K,V>[])new HashEntry[cap]);
        Segment<K,V>[] ss = (Segment<K,V>[])new Segment[ssize];
        UNSAFE.putOrderedObject(ss, SBASE, s0); // ordered write of segments[0]
        this.segments = ss;
    }

3. put()Method

putThe process of the method does not look at the code first, think for yourself, it should be in this order:

① Calculate the hashvalue
② Calculate the Segmentcorresponding position of the array
③ Obtain the segment lock
④ Calculate the hashEntrycorresponding position of the array
⑤ Determine whether the hashEntryarray needs to be expanded
⑥ Insert the hashEntryarray or linked list
⑦ Release the segment lock

Take a look at the code below:

public V put(K key, V value) {
        Segment<K,V> s;
        if (value == null)
            throw new NullPointerException();
        int hash = hash(key);
        int j = (hash >>> segmentShift) & segmentMask;
        if ((s = (Segment<K,V>)UNSAFE.getObject          // nonvolatile; recheck
             (segments, (j << SSHIFT) + SBASE)) == null) //  in ensureSegment
            s = ensureSegment(j);
        return s.put(key, hash, value, false);
    }

final V put(K key, int hash, V value, boolean onlyIfAbsent) {
            HashEntry<K,V> node = tryLock() ? null :
                scanAndLockForPut(key, hash, value);
            V oldValue;
            try {
                HashEntry<K,V>[] tab = table;
                int index = (tab.length - 1) & hash;
                HashEntry<K,V> first = entryAt(tab, index);
                for (HashEntry<K,V> e = first;;) {
                    if (e != null) {
                        K k;
                        if ((k = e.key) == key ||
                            (e.hash == hash && key.equals(k))) {
                            oldValue = e.value;
                            if (!onlyIfAbsent) {
                                e.value = value;
                                ++modCount;
                            }
                            break;
                        }
                        e = e.next;
                    }
                    else {
                        if (node != null)
                            node.setNext(first);
                        else
                            node = new HashEntry<K,V>(hash, key, value, first);
                        int c = count + 1;
                        if (c > threshold && tab.length < MAXIMUM_CAPACITY)
                            rehash(node);
                        else
                            setEntryAt(tab, index, node);
                        ++modCount;
                        count = c;
                        oldValue = null;
                        break;
                    }
                }
            } finally {
                unlock();
            }
            return oldValue;
        }

putThe flow of the method is basically consistent with what was expected. The method to call after getting the Segmentcorresponding position of the array . method is similar .SegmentputSegmentputHashMap

First try to acquire the lock, if not acquired, it will spin to a certain extent (if it exceeds a certain number of times, the current thread will be blocked), this process is scanAndLockForPut()completed in the method; if the lock is acquired, the calculation needs to be added. Which linked list is located, and get the first node of the linked list, that is, the firstvariable, and then traverse the linked list. If the new node keyalready exists in the linked list, replace the valuevalue, add 1 to the modification times modCount, and release the lock;

If the newly added node is keynot in the linked list, it is judged whether ConcurrentHashMapthe number of stored elements is greater than the thresholdthreshold and less than the maximum capacity, and if so, the capacity is expanded; otherwise, the new node is inserted into the head of the linked list to complete the putoperation.


doubt

1行 -- >HashEntry<K,V>[] tab = table;2行 --->int index = (tab.length - 1) & hash;3行 --->HashEntry<K,V> first = entryAt(tab, index);

For the code above: why not just use the tablevariable directly?

According to some data, because it tableis a volatilevariable, it will consume more resources (write: write to main memory immediately; read: get it from main memory).

However, how to ensure the consistency of the tabvariables and variables used in lines 2 and 3 ?table


4. rehash()Method

private void rehash(HashEntry<K,V> node) {
            HashEntry<K,V>[] oldTable = table;
            int oldCapacity = oldTable.length;
            int newCapacity = oldCapacity << 1;
            threshold = (int)(newCapacity * loadFactor);
            HashEntry<K,V>[] newTable =
                (HashEntry<K,V>[]) new HashEntry[newCapacity];
            int sizeMask = newCapacity - 1;
            for (int i = 0; i < oldCapacity ; i++) {
                HashEntry<K,V> e = oldTable[i];
                if (e != null) {
                    HashEntry<K,V> next = e.next;
                    int idx = e.hash & sizeMask;
                    if (next == null)   //  Single node on list
                        newTable[idx] = e;
                    else { // Reuse consecutive sequence at same slot
                        HashEntry<K,V> lastRun = e;
                        int lastIdx = idx;
                        for (HashEntry<K,V> last = next;
                             last != null;
                             last = last.next) {
                            int k = last.hash & sizeMask;
                            if (k != lastIdx) {
                                lastIdx = k;
                                lastRun = last;
                            }
                        }
                        newTable[lastIdx] = lastRun;
                        // Clone remaining nodes
                        for (HashEntry<K,V> p = e; p != lastRun; p = p.next) {
                            V v = p.value;
                            int h = p.hash;
                            int k = h & sizeMask;
                            HashEntry<K,V> n = newTable[k];
                            newTable[k] = new HashEntry<K,V>(h, p.key, v, n);
                        }
                    }
                }
            }
            int nodeIndex = node.hash & sizeMask; // add the new node
            node.setNext(newTable[nodeIndex]);
            newTable[nodeIndex] = node;
            table = newTable;
        }

This is the ConcurrentHashMapexpansion method, which HashEntrydoubles the size of the array oldCapacityand moves the data from the old array to the new array newCapacity.

We know that what oldCapacityis stored in the array is the head node of each linked list. After the array is enlarged, each node needs to recalculate the position. If the linked list has only one node, it is directly put into the new array; for the linked list with multiple nodes, ConcurrentHashMapthe processing method is special. Take a specific linked list as an example:
write picture description here

After calculation, the index positions of the new array of nodes on a certain linked list are: 3, 4, 3, 3. Needless to say, the first two are added to the third HashEntrylinked list and the fourth HashEntrylinked list respectively. But the latter two belong to the same linked list, so just add the third one directly.

4. size()Method

public int  size() {
         // Try a few times to get accurate count. On failure due to
        // continuous async changes in table, resort to locking.
         final Segment<K,V>[] segments = this.segments;
         int size;
         boolean overflow; // true if size overflows 32 bits
         long sum;         // sum of modCounts
         long last = 0L;   // previous sum
         int retries = -1; // first iteration isn't retry
         try {
             for (;;) {
                 if (retries++ == RETRIES_BEFORE_LOCK) {
                     for (int j = 0; j < segments.length; ++j)
                         ensureSegment(j).lock(); // force creation
                 }
                 sum = 0L;
                 size = 0;
                 overflow = false;
                 for (int j = 0; j < segments.length; ++j) {
                     Segment<K,V> seg = segmentAt(segments, j);
                     if (seg != null) {
                         sum += seg.modCount;
                         int c = seg.count;
                         if (c < 0 || (size += c) < 0)
                             overflow = true;
                     }
                 }
                 if (sum == last)
                     break;
                 last = sum;
             }
         } finally {
             if (retries > RETRIES_BEFORE_LOCK) {
                 for (int j = 0; j < segments.length; ++j)
                     segmentAt(segments, j).unlock();
             }
         }
         return overflow ? Integer.MAX_VALUE : size;
     }

The calculated ConcurrentHashMapelement size is an interesting problem, because it operates concurrently, that is, when you calculate the size, it is still inserting data concurrently, which may cause the calculated size to be different from your actual size (in When you return sizeinsert multiple data), to solve this problem, JDK1.7 version uses two schemes.

  1. In the first scheme, he will use the unlocked mode to try ConcurrentHashMapthe size calculated multiple times, up to three times, and compare the results of the two calculations before and after.
  2. The second scheme is that if the first scheme does not meet, he will Segmentadd a lock to each, and then return ConcurrentHashMapthe calculation size.

References:
Doug Lea: "Java Concurrent Programming Practice"
Fang Tengfei: "The Art of Java Concurrent Programming"

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325995904&siteId=291194637