Why learn HashMap source code?

As a java developer, basically the most commonly used data structures are HashMap and List. The design of jdk's HashMap is still worthy of in-depth study.

Whether in an interview or at work, knowing the principles will help us a lot.

The content of this article is longer, it is recommended to collect it first, and then savor it carefully.

Unlike simple source code analysis on the Internet, it is more about the design ideas behind the implementation.

The content involved is quite extensive, from the Poisson distribution in statistics, to computer-based bit operations, classic red-black trees, linked lists, arrays and other data structures. It also talks about the introduction of the Hash function. At the end of the article, Meituan is also introduced. For the source code analysis of HashMap, the overall depth and breadth are relatively large.

The mind map is as follows:

mind Mapping

This article was compiled two years ago . It is inevitable that there are omissions and outdated parts in the article. We welcome your valuable comments.

The reason why it is taken out here has the following purposes:

(1) Let readers understand the design idea of HashMap and know the process of rehash. In the next section we will implement a HashMap ourselves

(2) Why implement HashMap by yourself?

Recently, in handwriting the redis framework, it is said that redis is a map with more powerful features, and naturally HashMap is the basis for entry. One of the outstanding design of Redis high performance is the progressive rehash, and we can implement a progressive rehash map with everyone, so that we can appreciate and understand the ingenious design of the author.

I want to make a common data structure independent as an open source tool for later use. For example, this time handwriting redis, circular linked list, LRU map, etc. are all written from scratch, which is not conducive to reuse and prone to bugs.

Okay, let's start the journey of HashMap's source code together~

HashMap source code

HashMap is a collection class that is usually used a lot, and it feels necessary to learn more.

First try to read the source code yourself.

java version

$ java -version
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)

data structure

In terms of structural realization, HashMap is realized by array + linked list + red-black tree (JDK1.8 adds the red-black tree part).

Official description of the current class

Mapping interface based on hash table implementation. This implementation provides all optional mapping operations and allows null values and keys. (The HashMap class is roughly equivalent to Hashtable, but it is asynchronous and allowed to be empty.)

This class does not guarantee the order of mapping; in particular, it does not guarantee that the order will remain the same over time.

This implementation provides constant-time performance for basic operations (get and put), assuming that the hash function appropriately distributes the elements in each bucket. The iteration of the collection view requires time proportional to the "capacity" (number of buckets) of the HashMap instance and its size (number of key-value mappings). Therefore, if iterative performance is important, it is very important not to set the initial capacity too high (or the load factor is too low).

HashMap instance has two parameters that affect its performance: initial capacity and load factor .

The capacity is the number of buckets in the hash table, and the initial capacity is just the capacity when the hash table is created. The load factor is a measure of the maximum capacity that the hash table is allowed to reach before the capacity of the hash table is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table will be hashed again (that is, the internal data structure is rebuilt), so that the number of buckets in the hash table is approximately the original double.

Generally speaking, the default load factor ( 0.75) provides a good trade-off between time and space cost.

A higher value reduces the space overhead, but increases the search cost (reflected in most operations of the HashMap class, including get and put). When setting the initial capacity of the map, the expected number of entries in the map and its load factor should be considered to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehashing operation will occur.

If you want to store many maps in a HashMap instance, creating a map with a large enough capacity will make the map storage more efficient, instead of letting it perform automatic rehashing as needed to grow the table.

Note that using multiple keys with the same hashCode() can indeed reduce the performance of any hash table. In order to improve the impact, when the keys are comparable, this class can use the order of comparison between the keys to help disconnect.

Note that this implementation is not synchronous. If multiple threads access the hash map concurrently, and at least one thread modifies the map structurally, then it must be synchronized externally. (Structural modification is any operation that adds or deletes one or more mappings; only changing the value associated with the key already contained in the instance is not a structural modification. This is usually done by synchronizing the objects that naturally encapsulate the mapping.

If there is no such object, you should use the collection "wrapper" Collections.synchronizedMap method. This is best done at creation time to prevent accidental asynchronous access to the map:

Map m = Collections.synchronizedMap(new HashMap(...));

All the iterators returned by the "collection view methods" of this class fail quickly: if the mapping is modified at any time after the iterator is created, except through the iterator's own remove method, the iterator will throw ConcurrentModificationException. Therefore, in the case of concurrent modification, the iterator will fail quickly and cleanly, rather than risking arbitrary and uncertain behavior in an uncertain time in the future.

Note that the fast failure behavior of iterators cannot be guaranteed, because generally speaking, it is impossible to make any hard guarantees in the presence of asynchronous concurrent modifications. The fast fail iterator throws ConcurrentModificationException in the best way. Therefore, it is wrong to write programs that rely on this exception to ensure its correctness: the fast failure behavior of the iterator should only be used to detect errors.

Other basic information

This class is a member of the Java collection framework.
@since 1.2
java.util package

Initial exploration of source code

interface

public class HashMap<K,V> extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable {}

The current class implements three interfaces, we are primarily concerned with Mapthe interface can be.

Inherited an abstract class AbstractMap, this will be temporarily studied later in this section.

Constant definition

Default initial capacity

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

Why not use 16 directly?

After reading statckoverflow, the more reliable explanation is:

In order to avoid the use of magic numbers, the constant definition itself has a self-explanatory meaning.
Emphasize that this number must be a power of 2.

Why is it a power of 2?

It is designed this way because it allows the use of fast bit-sum operations to pack the hash code of each key into the capacity of the table, as you can see in the method of accessing the table:

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) { /// <-- bitwise 'AND' here
        ...

Maximum capacity

The maximum capacity used when a higher value is implicitly specified.

By any constructor with parameters.

Must be a power of 2 and less than 1<<30.

/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
*/
static final int MAXIMUM_CAPACITY = 1 << 30;

Why is 1 << 30?

Of course, the maximum capacity of the interger is 2^31-1

In addition, 2**31 is 2 billion, and each hash entry requires an object as the entry itself, an object as a key, and an object as a value.

Before allocating space for other content in the application, the minimum object size is usually around 24 bytes, so this will be 144 billion bytes.

It is safe to say that the maximum capacity limit is only theoretical.

I feel that the actual memory is not so big!

Load factor

When the load factor is large, the possibility of expanding the table array will be less, so it takes up less memory (less space), but there will be relatively more elements on each entry chain, and the query time will also be Increase (more time).

On the contrary, when the load factor is small, the possibility of expanding the table array is high, and the memory space is occupied, but the elements on the entry chain will be relatively small, and the time to find out will be reduced.

That's why the load factor is a compromise between time and space.

So when setting the load factor, you should consider whether you are looking for time or space.

/**
 * The load factor used when none specified in constructor.
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

Why is it 0.75, not 0.8 or 0.6

In fact, there is an explanation in the hashmap source code.

Because TreeNodes are about twice the size of regular nodes, we
use them only when bins contain enough nodes to warrant use
(see TREEIFY_THRESHOLD). And when they become too small (due to
removal or resizing) they are converted back to plain bins.  In
usages with well-distributed user hashCodes, tree bins are
rarely used.  Ideally, under random hashCodes, the frequency of
nodes in bins follows a Poisson distribution
(http://en.wikipedia.org/wiki/Poisson_distribution) with a
parameter of about 0.5 on average for the default resizing
threshold of 0.75, although with a large variance because of
resizing granularity. Ignoring variance, the expected
occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
factorial(k)). The first values are:

0:    0.60653066
1:    0.30326533
2:    0.07581633
3:    0.01263606
4:    0.00157952
5:    0.00015795
6:    0.00001316
7:    0.00000094
8:    0.00000006
more: less than 1 in ten million

A simple translation is that under ideal circumstances, using random hash codes, the frequency of node appearance in the hash bucket follows the Poisson distribution, and a comparison table of the number and probability of the elements in the bucket is given.

From the table above, it can be seen that when the number of elements in the bucket reaches 8, the probability has become very small, that is to say, using 0.75 as the loading factor, it is almost impossible for the length of the linked list of each collision location to exceed 8.

Poisson distribution-Poisson distribution

Threshold

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 */
static final int TREEIFY_THRESHOLD = 8;

/**
 * The bin count threshold for untreeifying a (split) bin during a
 * resize operation. Should be less than TREEIFY_THRESHOLD, and at
 * most 6 to mesh with shrinkage detection under removal.
 */
static final int UNTREEIFY_THRESHOLD = 6;

/**
 * The smallest table capacity for which bins may be treeified.
 * (Otherwise the table is resized if too many nodes in a bin.)
 * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
 * between resizing and treeification thresholds.
 */
static final int MIN_TREEIFY_CAPACITY = 64;

TREEIFY_THRESHOLD

Use red-black trees instead of list bin count thresholds.

When an element is added to a bin with at least this many nodes, the bin is converted into a tree. This value must be greater than 2, and should be at least 8, in order to match the assumption in the tree deletion about converting back to a normal container after shrinking.

UNTREEIFY_THRESHOLD

Cancel (split) the storage count threshold of the repository during the resize operation.

Should be less than TREEIFY_THRESHOLD, and up to 6 grids are removed with shrinkage detection.

MIN_TREEIFY_CAPACITY

The smallest table capacity can be arranged in a tree for the container. (Otherwise, if there are too many nodes in a bin, the table will be resized.)

At least 4 * TREEIFY_THRESHOLD, to avoid conflicts between resizing and treeing thresholds.

Node

Source code

Node.java

Definition of the basic hash node.

/**
 * Basic hash bin node, used for most entries.  (See below for
 * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
 */
static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }
    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }
    public final int hashCode() {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }
    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }
    public final boolean equals(Object o) {
        // 快速判断
        if (o == this)
            return true;

        // 类型判断    
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

Personal understanding

Four core elements:

final int hash; // hash 值
final K key;    // key
V value;    // value 值
Node<K,V> next; // 下一个元素结点

Algorithm of hash value

The hash algorithm is as follows.

XOR ( ^) of direct key/value .

Objects.hashCode(key) ^ Objects.hashCode(value);

The hashCode() method is as follows:

public static int hashCode(Object o) {
    return o != null ? o.hashCode() : 0;
}

Finally, the hashCode() algorithm of the object itself is called. Generally we will define it ourselves.

Static tools

hash

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

Why is it so designed?

jdk8 comes with explanation

Calculate key.hashCode() and scatter the high bits of (XORs) to the low bits.

Because the table uses power-of-two masking, hashes that only change in bits above the current mask will always conflict.

(The known example has a set of floating-point keys, which store consecutive integers in a small table.)

Therefore, we applied a conversion to propagate the effects of high bits downward.

There is a trade-off between the speed, utility, and quality of bit propagation.

Because many common hash sets have been reasonably distributed (so don't benefit from propagation), because we use trees to handle large collisions in the trash can, we just XOR some changes to reduce the system loss in the cheapest way, and the highest position The impact will never be because of the table used in the index calculation.

Zhihu's explanation

This code is called a disturbance function .

The initial size of the array before HashMap expansion was only 16. So this hash value cannot be used directly.

Before using, it is necessary to perform a modulo operation on the length of the array, and the remainder can be used to access the array subscript.

putVal Function source code

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        //...    
}

One sentence tab[i = (n - 1) & hash])

This step is the process of finding buckets, which is the total number group in the above figure. If the capacity is 16 and the hash value is lower 16 bits according to the capacity, then the subscript range is within the capacity range.

This also explains why the size of the hashmap needs to be a positive integer power of 2, because this (array length-1) is exactly equivalent to a "low mask".

For example, the size is 16, then (16-1) = 15 = 00000000 00000000 00001111 (binary);

    10100101 11000100 00100101
&   00000000 00000000 00001111
-------------------------------
    00000000 00000000 00000101    //高位全部归零，只保留末四位

But the problem is that no matter how loose the distribution of hash values is, if only the last few bits are taken, the collision will be very serious.

The value of the disturbance function is as follows:

The value of the disturbance function

The right shift is 16 bits, which is exactly half of the 32bit. The XOR of the high half and the low half is to mix the high and low bits of the original hash code to increase the randomness of the low bits.

Moreover, the mixed low-level features are doped with high-level features, so that the high-level information is also retained in disguise.

Introduction to the principle of optimized hashing

comparable class

comparableClassFor()

Get a class object x, and if this class implements class C implements Comparable<C>the interface.

ps: This method is very useful for reference and can be easily expanded. We can get the type in any interface generic.

static Class<?> comparableClassFor(Object x) {
    if (x instanceof Comparable) {
        Class<?> c; Type[] ts, as; Type t; ParameterizedType p;
        if ((c = x.getClass()) == String.class) // bypass checks
            return c;
        if ((ts = c.getGenericInterfaces()) != null) {
            for (int i = 0; i < ts.length; ++i) {
                if (((t = ts[i]) instanceof ParameterizedType) &&
                    ((p = (ParameterizedType)t).getRawType() ==
                     Comparable.class) &&
                    (as = p.getActualTypeArguments()) != null &&
                    as.length == 1 && as[0] == c) // type arg is c
                    return c;
            }
        }
    }
    return null;
}

compareComparables ()

Get the comparison result of two comparable objects.

@SuppressWarnings({"rawtypes","unchecked"}) // for cast to Comparable
static int compareComparables(Class<?> kc, Object k, Object x) {
    return (x == null || x.getClass() != kc ? 0 :
            ((Comparable)k).compareTo(x));
}

tableSizeFor

Get the power of 2

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

Called place

public HashMap(int initialCapacity, float loadFactor) {
    // check...
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

Impressions

emmm.... Why do you write this? Performance?

Simple Analysis

When instantiating a HashMap instance, if the initialCapacity is given, since the capacity of the HashMap is a power of 2, this method is used to find the smallest power of 2 that is greater than or equal to the initialCapacity (if the initialCapacity is a power of 2, it will return Still this number).

Why -1

int n = cap - 1;

First of all, why do you want to subtract 1 from the cap? int n = cap-1;
This is to prevent that cap is already a power of 2. If cap is already a power of 2 and this minus 1 operation is not performed, after performing the following unsigned right shift operations, the returned capacity will be twice the cap. If you don’t understand, you have to look at the next few unsigned right shifts and then come back.

Let's take a look at these unsigned right shift operations:

If n is 0 at this time (after cap-1), the unsigned right shift is still 0 after the next few unsigned shifts, and the final capacity returned is 1 (there is an n+1 operation at the end).

Only the case where n is not equal to 0 is discussed here.

First bit operation

n |= n >>> 1;

Since n is not equal to 0, there will always be a bit of 1 in the binary representation of n, and then consider the highest bit of 1.

By unsigned right shifting by 1 bit, the most significant 1 is shifted to the right by 1 bit, and then the OR operation is performed so that the right bit next to the most significant 1 in the binary representation of n is also 1, such as 000011xxxxxx.

And so on

Instance

For example, initialCapacity = 10;

表达式                       二进制
------------------------------------------------------    

initialCapacity = 10;
int n = 9;                  0000 1001
------------------------------------------------------    

n |= n >>> 1;               0000 1001
                            0000 0100   (右移1位) 或运算
                          = 0000 1101
------------------------------------------------------    

n |= n >>> 2;               0000 1101
                            0000 0011   (右移2位) 或运算
                          = 0000 1111
------------------------------------------------------    

n |= n >>> 4;               0000 1111
                            0000 0000   (右移4位) 或运算
                          = 0000 1111
------------------------------------------------------  

n |= n >>> 8;               0000 1111
                            0000 0000   (右移8位) 或运算
                          = 0000 1111
------------------------------------------------------  

n |= n >>> 16;              0000 1111
                            0000 0000   (右移16位) 或运算
                          = 0000 1111
------------------------------------------------------  

n = n+1;                    0001 0000    结果：2^4 = 16;

put() explained

The following content is from the Meituan blog Java 8 series: Recognizing HashMap

Since it is very well written, I copied it here directly.

Flow chart explanation

The execution process of the put method of HashMap can be understood by the following figure. If you are interested, you can compare the source code and study it more clearly.

Enter picture description

①. Determine whether the key-value pair array table[i] is empty or null, otherwise, execute resize() to expand;

②. Calculate the hash value according to the key value key to get the inserted array index i, if table[i]==null, directly create a new node to add, go to ⑥, if table[i] is not empty, go to ③;

③. Judge whether the first element of table[i] is the same as the key, if the same, directly overwrite the value, otherwise go to ④, where the same refers to hashCode and equals;

④. Determine whether table[i] is a treeNode, that is, whether table[i] is a red-black tree, if it is a red-black tree, insert the key-value pair directly into the tree, otherwise go to ⑤;

⑤. Traverse table[i], determine whether the length of the linked list is greater than 8, if it is greater than 8, convert the linked list to a red-black tree, and perform the insertion operation in the red-black tree, otherwise perform the insertion operation of the linked list; if the key already exists during the traversal process Just overwrite value directly;

⑥. After the insertion is successful, judge whether the actual number of key-value pairs size exceeds the maximum capacity threshold, and if it exceeds, expand the capacity.

Method source code

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}
/**
 * Implements Map.put and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

Expansion mechanism

Introduction

Resize is to recalculate the capacity and add elements to the HashMap object continuously, and when the array inside the HashMap object cannot load more elements, the object needs to expand the length of the array so that more elements can be loaded.

Of course, the array in Java cannot be automatically expanded. The method is to use a new array to replace the existing array with a small capacity, just like we use a small bucket to store water. If we want to store more water, we have to change the big bucket. .

JDK7 source code

Let's analyze the source code of resize(). In view of the fact that JDK1.8 is integrated into the red-black tree, which is more complicated, we still use the code of JDK1.7 in order to facilitate the understanding.

void resize(int newCapacity) {   //传入新的容量
    Entry[] oldTable = table;    //引用扩容前的Entry数组
    int oldCapacity = oldTable.length;         
    if (oldCapacity == MAXIMUM_CAPACITY) {  //扩容前的数组大小如果已经达到最大(2^30)了
        threshold = Integer.MAX_VALUE; //修改阈值为int的最大值(2^31-1)，这样以后就不会扩容了
        return;
    }

    Entry[] newTable = new Entry[newCapacity];  //初始化一个新的Entry数组
    transfer(newTable);                         //！！将数据转移到新的Entry数组里
    table = newTable;                           //HashMap的table属性引用新的Entry数组
    threshold = (int)(newCapacity * loadFactor);//修改阈值
}

Here is to use a larger-capacity array to replace the existing smaller-capacity array. The transfer() method copies the elements of the original Entry array to the new Entry array.

void transfer(Entry[] newTable) {
    Entry[] src = table;                   //src引用了旧的Entry数组
    int newCapacity = newTable.length;
    for (int j = 0; j < src.length; j++) { //遍历旧的Entry数组
        Entry<K,V> e = src[j];             //取得旧Entry数组的每个元素
        if (e != null) {
            src[j] = null;//释放旧Entry数组的对象引用（for循环后，旧的Entry数组不再引用任何对象）
            do {
                Entry<K,V> next = e.next;
                int i = indexFor(e.hash, newCapacity); //！！重新计算每个元素在数组中的位置
                e.next = newTable[i]; //标记[1]
                newTable[i] = e;      //将元素放在数组上
                e = next;             //访问下一个Entry链上的元素
            } while (e != null);
        }
    }
}

The reference of newTable[i] is assigned to e.next, that is, the head insertion method of a singly linked list is used, and the new element at the same position is always placed at the head of the linked list;

In this way, the element first placed on an index will eventually be placed at the end of the Entry chain (if a hash conflict occurs). This is different from Jdk 1.8, which is explained in detail below.

Elements on the same Entry chain in the old array may be placed in different positions in the new array after recalculating the index position.

Case

Here is an example to illustrate the expansion process. Suppose that our hash algorithm is simply the size of the table (that is, the length of the array) using key mod.

The size of the hash bucket array table is 2, so key = 3, 7, 5, and the put order is 5, 7, and 3.

After mod 2 all conflicts are in table[1].

It is assumed that the load factor loadFactor=1, that is, when the actual size of the key-value pair is larger than the actual size of the table, the capacity is expanded.

The next three steps are the process of resize the hash bucket array to 4, and then rehash all Nodes.

Enter picture description

Jdk8 optimization

After observation, we can find that we are using the expansion of the power of 2 (meaning that the length is expanded by 2 times), so the position of the element is either in the original position or moved to the position of the power of 2 in the original position.

Look at the figure below to understand the meaning of this sentence, n is the length of the table, figure (a) shows an example of determining the index position of the two keys before expansion, key1 and key2.

Figure (b) shows an example in which two keys, key1 and key2, determine the index position after expansion, where hash1 is the hash and high-order operation result corresponding to key1.

Bit operation

After the element recalculates the hash, because n is doubled, the mask range of n-1 is 1 bit (red) higher, so the new index will change like this:

index

Therefore, when we expand the HashMap, we don’t need to recalculate the hash like the implementation of JDK1.7. We just need to see if the bit added by the original hash value is 1 or 0. If it is 0, the index remains unchanged. If it is 1, the index becomes "original index + oldCap", you can look at the resize diagram of 16 expanded to 32 in the figure below:

rehash

This design is indeed very clever, it not only saves the time to recalculate the hash value, and at the same time, because the new 1bit is 0 or 1 can be considered random, so the resize process evenly distributes the previous conflicting nodes To the new bucket.

This is the new optimization point of JDK1.8.

One thing to pay attention to is the difference. When rehashing in JDK1.7, when the old linked list is migrated to the new linked list, if the array index position of the new table is the same, the linked list elements will be inverted, but as can be seen from the above figure, JDK1.8 will not Upside down.

JDK8 source code

Interested students can study the resize source code of JDK1.8, which is very good:

/**
 * Initializes or doubles table size.  If null, allocates in
 * accord with initial capacity target held in field threshold.
 * Otherwise, because we are using power-of-two expansion, the
 * elements from each bin must either stay at same index, or move
 * with a power of two offset in the new table.
 *
 * @return the table
 */
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

summary

If you have read the full text, you are already very good.

In fact, it doesn't matter if you don't fully understand it the first time, just know that HashMap has a reHash process, similar to the resize of ArrayList.

In the next section, we will learn how to implement a progressive rehash HashMap by handwriting. If you are interested, you can pay attention to it to receive the latest content in real time.

If you think this article is helpful to you, please like, comment, favorite and forward a wave. Your encouragement is my biggest motivation~

I don’t know what you have gained? Or if you have more ideas, welcome to discuss with me in the message area and look forward to meeting your thoughts.

From scratch handwritten cache framework redis (13) HashMap source code principle explained