HashMap underlying principles and interview answers

 

Features: The time complexity of adding, deleting and checking are all O (1).

Storage KEY can be Null and unique, Value can be NULL; Capacity expansion (initial size is 16 (power of 2), load factor is 0.75, capacity expansion is twice the original size);

Concept: Disturbance function (hash function): Use the hash function to avoid the use of poor hashCode (); thus reducing the occurrence of hash collisions.

 

JDK1.7

Data structure: Array and linked list (one-way); thread unsafe, multi-threaded endless loop will occur;

The structure of the array plus linked list: the array composed of the internal Entry in HashMap; calculating the array subscript: the hash () algorithm implemented by itself, and performing the XOR operation and taking the modulus to obtain the corresponding array subscript position; writing the hash algorithm The advantage is that the occurrence of hash collisions is reduced.

The code is as follows: the hashCode value of the key, 9 times perturbation processing = 4 times bit operation + 5 times XOR.

 

final int hash(Object k) {
    int h = hashSeed;
    if (0 != h && k instanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }

    h ^= k.hashCode();

    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

 The linked list adopts the head insertion method . During the expansion process, in the case of multi-threaded concurrency: the transfer () method, loops the old linked list, and then re-points the new list header. One thread suspends one thread to continue execution, which will eventually lead to two threads Inserting data points to each other to form a circular linked list. When getting the data in the array, it will cause an endless loop.

void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        while(null != e) {
        // 如多线程执行时,某一线程执行到该步骤挂起,则会导致上述所说的环形链表的产生。
            Entry<K,V> next = e.next; 
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            int i = indexFor(e.hash, newCapacity);
            e.next = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
}

In addition to the dead loop caused by the head interpolation method, the linked list structure has a serious problem. When a large number of hash collisions occur, a large number of Nodes are stored in an Entry. Due to the one-way linked list, the data search (sequential search) greatly affects the search. effectiveness.

 

JDK1.8

 

Data structure: Array plus linked list plus red-black tree; thread unsafe, tail interpolation will not fall into an endless loop;

The hashCode value of the key, 2 perturbation processing = 1 bit operation + 1 XOR.

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

The linked list uses tail interpolation, which will not cause an endless loop, but in the case of multi-threading, it may cause some data loss.

The threshold for the conversion of the linked list into a red-black tree is 8:

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 */
static final int TREEIFY_THRESHOLD = 8;

/**
 * The bin count threshold for untreeifying a (split) bin during a
 * resize operation. Should be less than TREEIFY_THRESHOLD, and at
 * most 6 to mesh with shrinkage detection under removal.
 */
static final int UNTREEIFY_THRESHOLD = 6;

...

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
  ...
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash); //转化为红黑树
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        ...
}

 The question is why when it is less than 7, it is converted into a linked list, and if it is greater than 7, it is converted into a red-black tree.

First of all, the red-black tree is not necessarily more efficient than the linked list. When there are many nodes, the red-black tree is more efficient; choose 6 and 8 (if the linked list is less than or equal to 6 trees, restore to a linked list, and greater than or equal to 8 to a tree), There is a difference of 7 in the middle to effectively prevent frequent conversion of linked lists and trees; the frequency of nodes in the container distributed in the hash bucket follows the Poisson distribution, and the probability that the length of the bucket exceeds 8 is very very small.

 

 

 

 

 

 

 

Published 27 original articles · praised 0 · visits 9932

Guess you like

Origin blog.csdn.net/weixin_38246518/article/details/105519206