HashMap underlying implementation - to be continued

hashMap source code reading and note-taking process documentation Debug

hashmap source probably looking for a day and a half in which there are many details, this out of some of the details of the interview and often asked to talk about their own understanding, the article will eventually posted some of the better blog. Source with specific comments would be on my github. This article will not have much source content

data structure

HashMap1.8 in

  • Underlying data structure: using an array + + black tree list

HashMap1.7 in

  • Underlying data structure: using an array list +

Array + list the benefits of adding the effect of deleting time high complexity and balance an array of ways to reduce the high degree of complex query time list

Red-black tree + list the benefits: balancing space and time, black tree search efficiency optimization can achieve O (logn) level, but each node needs to store a pointer around in the tree node hashmap also store the next pointer to the list the length of time of less than 6 into a list structure.

Head and tail interpolation interpolation

Why is there a deadlock under high concurrency environment hashmap

The initial capacity of the array is set to 16 reasons?

HashMap initial capacity of the array is set to 16, the reason for the array by calculating hash value of the node (also called a hash bucket) such that when the target node hash distribution.

An example will be described by

Hash value hash bucket in obtaining the length of the array subscripts modulo operation, to ensure the array index in length. Source code using a logical operation reduces modulo time overhead.

first = tab[(n - 1) & hash]
复制代码

First you need to understand hash value int type, 32 bits. Suppose we take calculated hash to the previous 1010100111 0 check for hash bucket to 10 and 16 to calculate different results.

Decimal turn Binary 10-> 1010B 16-> 10000B

1010B-1=1001B 10000B-1=1111B

 1010100111                1010100111   

 &     1001                      1111

       0001                      0111
复制代码

The calculated index is different from the above no problem, but if additional element to be put into hashMap in his hash code 1010100011

1010100011                1010100011   

&     1001                      1111

      0001                      0011
复制代码

We can visually see that the two hash codes with different final four was placed in the same subscript, the disadvantage is that the list becomes very long in the length of a particular index in the array. Search efficiency becomes low, so that the length of the array 16 can ensure that each was linked list, hash value for each node is the same. This is typical of the use of space for time thinking. Similarly 32, 64, 128 as well. So we can see that the size of the array initialization above comments.

    /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
复制代码

From the initial size of 16 hashmap we understand the mysteries of the size of the array capacity.

If you want to know how the HashMap is to ensure that the capacity of the array of size N power of 2 you can look tableSizeFor () function

static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
复制代码

Optimization in java8 hashmap -> Optimization Function disturbance

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
复制代码

HashMap get in the array elements of the underlying calls getNode () method which adds more than a hash () function to enter the hash () function can see the internal code is also very simple

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
复制代码

After calling hashCode key here itself () method, with his sixteen itself be a high XOR operation. The purpose of this operation makes hashCode code is not just about low sixteen, but also to have high sixteen is also a relationship, increasing the complexity of avoiding inefficient hashcode hash function hash collision resulted in word only relevant in the post, so that the calculated the hash code more uniform.

In comparison with JDK1.7 find the same idea of ​​both, but different implementations.

h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
复制代码

Methods to resolve the HashMap collision

Simply put values ​​into the array, the array is empty directly into, there are elements of it into the back of the element. Specifically linked list or behind the back of the red-black tree, not Benpian focus.

Note that here only when the size of the array and a single chain length over 64> = 8 will be called when the function list is to convert the red-black tree.

String class hashcode () function calculates Why did you choose 31 as the multiplier factor?

 int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
复制代码

First, using the key comes hashcode () method then adding perturbation function h >>> 16 bit operation and then a

String class hashcode method using a multiplier 31 for two reasons as the quality factor

  • 31 * i = (2 >> 5) * i -i jvm some optimization may be performed on him
  • Use 31 hash operations when excessive hash collision rate of prime numbers 2 and 101 issues prime hash value calculation overflow does not occur. Other relatively prime with 1729, etc., a calculation using the hash performed when hashcode, and even more uniform distribution.

String select 31 as the calculation result based on the above two classes of hashcode reasons.

The following annotated source code and content will be a supplement in two days

Application of equals () and hashCode () of

As well as their importance in the HashMap

Benefits of immutable objects

Multithreading competitive conditions HashMap

Use HashMap readjust the size determined rehash rehash and subscript

Guess you like

Origin juejin.im/post/5e61047e6fb9a07ca24f5c2a