Java8 - Re-recognize HashMap

Summary

HashMap is the most frequently used data type for mapping (key-value pair) processing by Java programmers. With the update of the JDK (Java Developmet Kit) version, JDK1.8 has optimized the underlying implementation of HashMap, such as introducing the data structure of the red-black tree and optimizing the expansion. This article combines the differences between JDK1.7 and JDK1.8, and discusses the structure implementation and functional principle of HashMap in depth.
 

Introduction

Java defines an interface java.util.Map for the mapping in the data structure. This interface mainly has four commonly used implementation classes, namely HashMap, Hashtable, LinkedHashMap and TreeMap. The class inheritance relationship is shown in the following figure:

java.util.map class diagram

The following is a description of the characteristics of each implementation class:

(1) HashMap: It stores data according to the hashCode value of the key. In most cases, its value can be directly located, so it has a fast access speed, but the traversal order is uncertain. HashMap only allows the key of one record to be null at most, and allows the value of multiple records to be null. HashMap is not thread-safe, that is, multiple threads can write HashMap at the same time at any time, which may lead to data inconsistency. If you need to meet thread safety, you can use the synchronizedMap method of Collections to make HashMap thread-safe, or use ConcurrentHashMap.

(2) Hashtable: Hashtable is a legacy class. The common functions of many mappings are similar to HashMap. The difference is that it inherits from the Dictionary class and is thread-safe. Only one thread can write Hashtable at any time, and the concurrency is not as good as ConcurrentHashMap, because ConcurrentHashMap introduces segment locks. Hashtable is not recommended to be used in new code. It can be replaced by HashMap when thread safety is not required, and ConcurrentHashMap can be used when thread safety is required.

(3) LinkedHashMap: LinkedHashMap is a subclass of HashMap, which saves the insertion order of records. When using Iterator to traverse LinkedHashMap, the first record must be inserted first, or it can be constructed with parameters and sorted according to the access order.

(4) TreeMap: TreeMap implements the SortedMap interface, which can sort the records it saves according to the key. The default is to sort in ascending order of the key value. You can also specify a sorting comparator. When iterator is used to traverse the TreeMap, the obtained records are sorted. of. If using sorted maps, TreeMap is recommended. When using TreeMap, the key must implement the Comparable interface or pass in a custom Comparator when constructing the TreeMap, otherwise an exception of type java.lang.ClassCastException will be thrown at runtime.

For the above four types of Map classes, the key in the map is required to be an immutable object. Immutable objects are objects whose hash value cannot be changed after the object is created. If the hash value of the object changes, the Map object may not be able to locate the mapped location.

Through the above comparison, we know that HashMap is a common member of Java's Map family. Since it can meet the usage conditions of most scenarios, it is the most frequently used one. In the following, we mainly combine the source code to explain the working principle of HashMap in depth from the aspects of storage structure, common method analysis, expansion and security.

Internal implementation

To figure out HashMap, you first need to know what HashMap is, that is, its storage structure-field; secondly, to figure out what it can do, that is, its function implementation-method. Below we explain these two aspects in detail.

Storage Structure - Fields

In terms of structural implementation, HashMap is implemented by array + linked list + red-black tree (JDK1.8 adds the red-black tree part), as shown below.

hashMap memory structure diagram

Two questions need to be clarified here: what is the underlying storage of the data? What are the advantages of this storage method?

(1) From the source code, there is a very important field in the HashMap class, which is the Node[] table, which is an array of hash buckets, which is obviously an array of Nodes. Let's see what Node [JDK1.8] is.

static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;    //用来定位数组索引位置
        final K key;
        V value;
        Node<K,V> next;   //链表的下一个node

        Node(int hash, K key, V value, Node<K,V> next) { ... }
        public final K getKey(){ ... }
        public final V getValue() { ... }
        public final String toString() { ... }
        public final int hashCode() { ... }
        public final V setValue(V newValue) { ... }
        public final boolean equals(Object o) { ... }
}

Node is an inner class of HashMap that implements the Map.Entry interface, which is essentially a map (key-value pair). Each black dot in the image above is a Node object.

(2) HashMap is stored using a hash table. In order to solve the conflict in the hash table, the open address method and the chain address method can be used to solve the problem. The HashMap in Java adopts the chain address method. The chain address method, in simple terms, is the combination of an array and a linked list. There is a linked list structure on each array element. When the data is hashed, the array subscript is obtained, and the data is placed on the linked list corresponding to the subscript element. For example, the program executes the following code:

    map.put("美团","小美");

The system will call the hashCode() method of the key "Meituan" to get its hashCode value (this method is applicable to each Java object), and then go through the last two steps of the Hash algorithm (high-order operation and modulo operation, as described below). ) to locate the storage location of the key-value pair, and sometimes two keys will be located in the same location, indicating that a Hash collision has occurred. Of course, the more uniform the calculation results of the Hash algorithm are, the smaller the probability of Hash collision, and the higher the access efficiency of the map.

If the hash bucket array is large, even a poor hash algorithm will be scattered. If the hash bucket array array is small, even a good hash algorithm will have more collisions, so it is necessary to balance the space cost and the time cost. The trade-off is actually to determine the size of the hash bucket array according to the actual situation, and to design a hash algorithm based on this to reduce hash collisions. So how to control the map so that the probability of Hash collision is small, and the hash bucket array (Node[] table) takes up less space? The answer is a good Hash algorithm and expansion mechanism.

Before understanding Hash and the expansion process, we must first understand several fields of HashMap. From the source code of the default constructor of HashMap, the constructor is to initialize the following fields. The source code is as follows:

     int threshold;             // 所能容纳的key-value对极限 
     final float loadFactor;    // 负载因子
     int modCount;  
     int size;

First, the initial length of the Node[] table is length (the default value is 16), the Load factor is the load factor (the default value is 0.75), and the threshold is the number of Nodes (key-value pairs) that the HashMap can hold with the maximum amount of data. threshold = length * Load factor. That is to say, after the length of the array is defined, the larger the load factor, the more key-value pairs it can hold.

Combined with the definition formula of the load factor, the threshold is the maximum number of elements allowed under the load factor and length (array length). If the number exceeds this number, it will be resized (expanded). The expanded HashMap capacity is twice the previous capacity. The default load factor of 0.75 is a balanced choice for space and time efficiency. It is recommended that you do not modify it, unless in special cases of time and space, if there is a lot of memory space and high time efficiency requirements, you can reduce the load factor Load The value of factor; on the contrary, if the memory space is tight and time efficiency is not required, you can increase the value of the load factor loadFactor, which can be greater than 1.

The size field is actually well understood, which is the number of key-value pairs that actually exist in the HashMap. Note the difference between the length of the table and the maximum number of key-value pairs threshold. The modCount field is mainly used to record the number of times the internal structure of HashMap has changed, which is mainly used for fast failure of iteration. It should be emphasized that the change of the internal structure refers to the change of the structure, such as putting a new key-value pair, but the value value corresponding to a key is overwritten, which is not a structural change.

In HashMap, the length of the hash bucket array table must be the nth power of 2 (must be a composite number). This is an unconventional design. The conventional design is to design the size of the bucket as a prime number. Relatively speaking, the probability of conflict caused by prime numbers is smaller than that of composite numbers. For the specific proof, please refer to http://blog.csdn.net/liuqiyao_01/article/details/14475159 . The Hashtable initialization bucket size is 11, which is the application of the bucket size designed as a prime number. (Hashtable cannot be guaranteed to be prime after expansion). HashMap adopts this unconventional design, mainly to optimize the modulo and expansion, and to reduce conflicts, HashMap also adds the process of high-level participation in the calculation when locating the index position of the hash bucket.

There is a problem here. Even if the load factor and Hash algorithm are designed reasonably, it is inevitable that the zipper will be too long. Once the zipper is too long, it will seriously affect the performance of HashMap. Therefore, in the JDK1.8 version, the data structure was further optimized, and the red-black tree was introduced. When the length of the linked list is too long (the default exceeds 8), the linked list will be converted into a red-black tree, and the performance of HashMap will be improved by using the characteristics of rapid addition, deletion, modification and checking of the red-black tree, which will use the insertion, deletion, and search of the red-black tree. algorithm. This article will no longer discuss the red-black tree. If you want to know more about the working principle of the red-black tree data structure, you can refer to http://blog.csdn.net/v_july_v/article/details/6105630 .

Function implementation - method

There are many internal functions of HashMap. This article mainly introduces three representative points of obtaining the index position of the hash bucket array according to the key, the detailed execution of the put method, and the expansion process.

1. Determine the index position of the hash bucket array

Regardless of adding, deleting, or finding key-value pairs, locating the location of the hash bucket array is a critical first step. As mentioned earlier, the data structure of HashMap is a combination of array and linked list, so of course we hope that the element positions in this HashMap are as evenly distributed as possible, and try to make the number of elements in each position only one, then when we use the hash algorithm to obtain this When the location is detected, we can immediately know that the element at the corresponding location is what we want, without traversing the linked list, which greatly optimizes the efficiency of the query. HashMap locates the index position of the array, which directly determines the discrete performance of the hash method. First look at the implementation of the source code (method 1 + method 2):

方法一:
static final int hash(Object key) {   //jdk1.8 & jdk1.7
     int h;
     // h = key.hashCode() 为第一步 取hashCode值
     // h ^ (h >>> 16)  为第二步 高位参与运算
     return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
方法二:
static int indexFor(int h, int length) {  //jdk1.7的源码,jdk1.8没有这个方法,但是实现原理一样的
     return h & (length-1);  //第三步 取模运算
}

The hash algorithm here is essentially three steps: taking the hashCode value of the key, high-order operation, and modulo operation .

For any given object, as long as its hashCode() returns the same value, the hash code value calculated by the program calling method 1 is always the same. The first thing we think of is to take the hash value modulo the length of the array, so that the distribution of elements is relatively uniform. However, the consumption of the modulo operation is still relatively large, which is done in HashMap: call method 2 to calculate which index of the table array the object should be stored at.

This method is very clever. It uses h & (table.length -1) to get the storage bit of the object, and the length of the underlying array of HashMap is always 2 to the nth power, which is the optimization of HashMap in terms of speed. When length is always 2 to the nth power, the h& (length-1) operation is equivalent to modulo length, which is h%length, but & is more efficient than %.

In the implementation of JDK1.8, the algorithm of high-order operations is optimized, which is implemented by the high-order 16-bit exclusive or low-order 16-bit of hashCode(): (h = k.hashCode()) ^ (h >>> 16), mainly It is considered from the perspective of speed, efficacy, and quality. This can ensure that when the length of the array table is relatively small, it can also ensure that both high and low Bits are involved in the calculation of Hash, and there will not be too much overhead.

In the following example, n is the length of the table.

hashMap hash algorithm example

2. Analyze the put method of HashMap

The execution process of the put method of HashMap can be understood by the following figure. If you are interested, you can study and study more clearly by comparing the source code.

The execution flow chart of the hashMap put method

①. Determine whether the key-value pair array table[i] is empty or null, otherwise execute resize() to expand;

②. Calculate the hash value according to the key value key to get the inserted array index i, if table[i]==null, directly create a new node to add, go to ⑥, if table[i] is not empty, go to ③;

③. Determine whether the first element of table[i] is the same as the key, if the same directly overwrites the value, otherwise turn to ④, the same here refers to hashCode and equals;

④. Determine whether table[i] is a treeNode, that is, whether table[i] is a red-black tree, if it is a red-black tree, insert the key-value pair directly into the tree, otherwise turn to ⑤;

⑤. Traverse table[i] to determine whether the length of the linked list is greater than 8. If it is greater than 8, convert the linked list into a red-black tree, and perform the insertion operation in the red-black tree, otherwise, perform the insertion operation of the linked list; if the key already exists during the traversal process Just overwrite the value directly;

⑥. After the insertion is successful, determine whether the actual number of key-value pairs in size exceeds the maximum capacity threshold. If it exceeds, expand the capacity.

The source code of the put method of JDK1.8HashMap is as follows:

 1 public V put(K key, V value) {
 2     // 对key的hashCode()做hash
 3     return putVal(hash(key), key, value, false, true);
 4 }
 5 
 6 final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
 7                boolean evict) {
 8     Node<K,V>[] tab; Node<K,V> p; int n, i;
 9     // 步骤①:tab为空则创建
10     if ((tab = table) == null || (n = tab.length) == 0)
11         n = (tab = resize()).length;
12     // 步骤②:计算index,并对null做处理 
13     if ((p = tab[i = (n - 1) & hash]) == null) 
14         tab[i] = newNode(hash, key, value, null);
15     else {
16         Node<K,V> e; K k;
17         // 步骤③:节点key存在,直接覆盖value
18         if (p.hash == hash &&
19             ((k = p.key) == key || (key != null && key.equals(k))))
20             e = p;
21         // 步骤④:判断该链为红黑树
22         else if (p instanceof TreeNode)
23             e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
24         // 步骤⑤:该链为链表
25         else {
26             for (int binCount = 0; ; ++binCount) {
27                 if ((e = p.next) == null) {
28                     p.next = newNode(hash, key,value,null);
                        //链表长度大于8转换为红黑树进行处理
29                     if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st  
30                         treeifyBin(tab, hash);
31                     break;
32                 }
                    // key已经存在直接覆盖value
33                 if (e.hash == hash &&
34                     ((k = e.key) == key || (key != null && key.equals(k)))) 
35                            break;
36                 p = e;
37             }
38         }
39         
40         if (e != null) { // existing mapping for key
41             V oldValue = e.value;
42             if (!onlyIfAbsent || oldValue == null)
43                 e.value = value;
44             afterNodeAccess(e);
45             return oldValue;
46         }
47     }

48     ++modCount;
49     // 步骤⑥:超过最大容量 就扩容
50     if (++size > threshold)
51         resize();
52     afterNodeInsertion(evict);
53     return null;
54 }

3. Expansion mechanism

Resize is to recalculate the capacity and add elements to the HashMap object continuously. When the array inside the HashMap object cannot hold more elements, the object needs to expand the length of the array so that it can load more elements. Of course, the array in Java cannot be automatically expanded. The method is to use a new array to replace the existing array with a small capacity, just like we use a small bucket of water. If we want to hold more water, we have to change to a larger bucket. .

Let's analyze the source code of resize. In view of the fact that JDK1.8 is integrated into the red-black tree, it is more complicated. In order to facilitate understanding, we still use the code of JDK1.7, which is easier to understand. There is little difference in essence. The specific difference will be discussed later.

 1 void resize(int newCapacity) {   //传入新的容量
 2     Entry[] oldTable = table;    //引用扩容前的Entry数组
 3     int oldCapacity = oldTable.length;         
 4     if (oldCapacity == MAXIMUM_CAPACITY) {  //扩容前的数组大小如果已经达到最大(2^30)了
 5         threshold = Integer.MAX_VALUE; //修改阈值为int的最大值(2^31-1),这样以后就不会扩容了
 6         return;
 7     }
 8  
 9     Entry[] newTable = new Entry[newCapacity];  //初始化一个新的Entry数组
10     transfer(newTable);                         //!!将数据转移到新的Entry数组里
11     table = newTable;                           //HashMap的table属性引用新的Entry数组
12     threshold = (int)(newCapacity * loadFactor);//修改阈值
13 }

Here is to use an array with a larger capacity to replace the existing array with a small capacity. The transfer() method copies the elements of the original Entry array to the new Entry array.

 1 void transfer(Entry[] newTable) {
 2     Entry[] src = table;                   //src引用了旧的Entry数组
 3     int newCapacity = newTable.length;
 4     for (int j = 0; j < src.length; j++) { //遍历旧的Entry数组
 5         Entry<K,V> e = src[j];             //取得旧Entry数组的每个元素
 6         if (e != null) {
 7             src[j] = null;//释放旧Entry数组的对象引用(for循环后,旧的Entry数组不再引用任何对象)
 8             do {
 9                 Entry<K,V> next = e.next;
10                 int i = indexFor(e.hash, newCapacity); //!!重新计算每个元素在数组中的位置
11                 e.next = newTable[i]; //标记[1]
12                 newTable[i] = e;      //将元素放在数组上
13                 e = next;             //访问下一个Entry链上的元素
14             } while (e != null);
15         }
16     }
17 }

The reference of newTable[i] is assigned to e.next, that is, the head insertion method of the singly linked list is used, and the new element at the same position will always be placed at the head position of the linked list; in this way, the element placed first on an index will end up It will be placed at the end of the Entry chain (if a hash conflict occurs), which is different from Jdk1.8, which will be explained in detail below. Elements in the same Entry chain in the old array may be placed in different positions in the new array after recalculating the index position.

The following is an example to illustrate the expansion process. Suppose that our hash algorithm is simply to use the key mod to look at the size of the table (that is, the length of the array). The size of the hash bucket array table is 2, so the keys are 3, 7, and 5, and the put order is 5, 7, and 3. After mod 2, all conflicts are in table[1] here. Here, it is assumed that the load factor loadFactor=1, that is, when the actual size of the key-value pair is larger than the actual size of the table, the expansion is performed. The next three steps are to resize the hash bucket array to 4, and then rehash all Nodes.

jdk1.7 expansion example

Below we explain what optimizations have been made in JDK1.8. After observation, it can be found that we are using the expansion of the power of 2 (meaning that the length is expanded by 2 times the original), so the position of the element is either in the original position, or it is moved to the position of the power of 2 in the original position. You can understand the meaning of this sentence by looking at the figure below. n is the length of the table. Figure (a) shows an example of determining the index position of two keys, key1 and key2, before expansion. Figure (b) shows two keys, key1 and key2, after expansion. An example of determining the index position, where hash1 is the result of the hash and high-order operation corresponding to key1.

hashMap 1.8 Hash algorithm example Figure 1

After the element recalculates the hash, because n is doubled, the mask range of n-1 is 1 bit more in the high position (red), so the new index will change like this:

hashMap 1.8 Hash algorithm example Figure 2

Therefore, when we expand the HashMap, we do not need to recalculate the hash like the implementation of JDK1.7. We only need to see whether the new bit of the original hash value is 1 or 0. If it is 0, the index does not change. If it is 1, the index becomes "original index + oldCap". You can see the resize diagram of 16 expanded to 32 in the following figure:

jdk1.8 hashMap expansion example

This design is really ingenious, it saves the time to recalculate the hash value, and at the same time, since the new 1bit is 0 or 1 can be considered random, so the resize process evenly distributes the previous conflicting nodes. to the new bucket. This piece is the new optimization point of JDK1.8. There is a little difference. When rehash in JDK1.7, when the old linked list migrates to the new linked list, if the array index position of the new list is the same, the linked list elements will be inverted, but as can be seen from the above figure, JDK1.8 will not upside down. Interested students can study the resize source code of JDK1.8, which is very well written, as follows:

 1 final Node<K,V>[] resize() {
 2     Node<K,V>[] oldTab = table;
 3     int oldCap = (oldTab == null) ? 0 : oldTab.length;
 4     int oldThr = threshold;
 5     int newCap, newThr = 0;
 6     if (oldCap > 0) {
 7         // 超过最大值就不再扩充了,就只好随你碰撞去吧
 8         if (oldCap >= MAXIMUM_CAPACITY) {
 9             threshold = Integer.MAX_VALUE;
10             return oldTab;
11         }
12         // 没超过最大值,就扩充为原来的2倍
13         else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
14                  oldCap >= DEFAULT_INITIAL_CAPACITY)
15             newThr = oldThr << 1; // double threshold
16     }
17     else if (oldThr > 0) // initial capacity was placed in threshold
18         newCap = oldThr;
19     else {               // zero initial threshold signifies using defaults
20         newCap = DEFAULT_INITIAL_CAPACITY;
21         newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
22     }
23     // 计算新的resize上限
24     if (newThr == 0) {
25 
26         float ft = (float)newCap * loadFactor;
27         newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
28                   (int)ft : Integer.MAX_VALUE);
29     }
30     threshold = newThr;
31     @SuppressWarnings({"rawtypes","unchecked"})
32         Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
33     table = newTab;
34     if (oldTab != null) {
35         // 把每个bucket都移动到新的buckets中
36         for (int j = 0; j < oldCap; ++j) {
37             Node<K,V> e;
38             if ((e = oldTab[j]) != null) {
39                 oldTab[j] = null;
40                 if (e.next == null)
41                     newTab[e.hash & (newCap - 1)] = e;
42                 else if (e instanceof TreeNode)
43                     ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
44                 else { // 链表优化重hash的代码块
45                     Node<K,V> loHead = null, loTail = null;
46                     Node<K,V> hiHead = null, hiTail = null;
47                     Node<K,V> next;
48                     do {
49                         next = e.next;
50                         // 原索引
51                         if ((e.hash & oldCap) == 0) {
52                             if (loTail == null)
53                                 loHead = e;
54                             else
55                                 loTail.next = e;
56                             loTail = e;
57                         }
58                         // 原索引+oldCap
59                         else {
60                             if (hiTail == null)
61                                 hiHead = e;
62                             else
63                                 hiTail.next = e;
64                             hiTail = e;
65                         }
66                     } while ((e = next) != null);
67                     // 原索引放到bucket里
68                     if (loTail != null) {
69                         loTail.next = null;
70                         newTab[j] = loHead;
71                     }
72                     // 原索引+oldCap放到bucket里
73                     if (hiTail != null) {
74                         hiTail.next = null;
75                         newTab[j + oldCap] = hiHead;
76                     }
77                 }
78             }
79         }
80     }
81     return newTab;
82 }

thread safety

In multi-threaded usage scenarios, the use of thread-unsafe HashMap should be avoided as much as possible, and thread-safe ConcurrentHashMap should be used instead. So why is HashMap thread-unsafe? The following example shows that using HashMap in concurrent multi-threaded usage scenarios may cause an infinite loop. The code example is as follows (for easy understanding, the environment of JDK1.7 is still used):

public class HashMapInfiniteLoop {  

    private static HashMap<Integer,String> map = new HashMap<Integer,String>(2,0.75f);  
    public static void main(String[] args) {  
        map.put(5, "C");  

        new Thread("Thread1") {  
            public void run() {  
                map.put(7, "B");  
                System.out.println(map);  
            };  
        }.start();  
        new Thread("Thread2") {  
            public void run() {  
                map.put(3, "A);  
                System.out.println(map);  
            };  
        }.start();        
    }  
}

Among them, the map is initialized as an array of length 2, loadFactor=0.75, threshold=2*0.75=1, that is to say, when the second key is put, the map needs to be resized.

Let thread 1 and thread 2 debug to the first line of the transfer method (code block in section 3.3) at the same time by setting breakpoints. Note that the two threads have successfully added data at this point. Release the breakpoint of thread1 to the line "Entry next = e.next;" of the transfer method; then release the breakpoint of thread 2 and let thread 2 resize. The result is as shown below.

jdk1.7 hashMap infinite loop example Figure 1

Note that e of Thread1 points to key(3), and next points to key(7), which points to the linked list after thread 2 rehash after thread 2 rehash.

As soon as the thread is scheduled back to execute, first execute newTalbe[i] = e, then e = next, which causes e to point to key(7), and next = e.next in the next loop causes next to point to key(3) ).

jdk1.7 hashMap infinite loop example Figure 2

jdk1.7 hashMap infinite loop example Figure 3

e.next = newTable[i] causes key(3).next to point to key(7). Note: At this time, key(7).next has already pointed to key(3), and the circular linked list appears like this.

jdk1.7 hashMap infinite loop example Figure 4

So, when we call map.get(11) with the thread, the tragedy occurs - Infinite Loop.

Performance comparison between JDK1.8 and JDK1.7

In HashMap, if the array index positions of the keys obtained by the hash algorithm are all different, that is, the Hash algorithm is very good, then the time complexity of the getKey method is O(1). Hash is extremely poor. All Hash algorithm results have the same index position, so that all key-value pairs are concentrated in a bucket, or in a linked list, or in a red-black tree, and the time complexity is O respectively. (n) and O(lgn). Given that JDK1.8 has done many optimizations, the overall performance is better than that of JDK1.7. Let's prove this with examples from two aspects.

Hash is more uniform

In order to facilitate testing, we first write a class Key, as follows:

class Key implements Comparable<Key> {

    private final int value;

    Key(int value) {
        this.value = value;
    }

    @Override
    public int compareTo(Key o) {
        return Integer.compare(this.value, o.value);
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass())
            return false;
        Key key = (Key) o;
        return value == key.value;
    }

    @Override
    public int hashCode() {
        return value;
    }
}

This class overrides the equals method and provides a very good hashCode function. The hashCode of any value will not be the same, because the value is directly used as the hashcode. To avoid frequent GCs, I cache immutable Key instances instead of creating them over and over again. code show as below:

public class Keys {

    public static final int MAX_KEY = 10_000_000;
    private static final Key[] KEYS_CACHE = new Key[MAX_KEY];

    static {
        for (int i = 0; i < MAX_KEY; ++i) {
            KEYS_CACHE[i] = new Key(i);
        }
    }

    public static Key of(int value) {
        return KEYS_CACHE[value];
    }
}

Now to start our test, all we need to do in the test is to create HashMaps of different sizes (1, 10, 100, ...... 10000000) to shield the expansion. The code is as follows:

   static void test(int mapSize) {

        HashMap<Key, Integer> map = new HashMap<Key,Integer>(mapSize);
        for (int i = 0; i < mapSize; ++i) {
            map.put(Keys.of(i), i);
        }

        long beginTime = System.nanoTime(); //获取纳秒
        for (int i = 0; i < mapSize; i++) {
            map.get(Keys.of(i));
        }
        long endTime = System.nanoTime();
        System.out.println(endTime - beginTime);
    }

    public static void main(String[] args) {
        for(int i=10;i<= 1000 0000;i*= 10){
            test(i);
        }
    }

Different values ​​are found in the test, and then the time spent is measured. In order to calculate the average time of getKey, we iterate through all the get methods, calculate the total time, divide by the number of keys, and calculate an average value, which is mainly used for comparison. Absolute values ​​can be affected by many environmental factors. The result is as follows:

Performance comparison table 1.png

Through the observation and test results, it can be seen that the performance of JDK1.8 is more than 15% higher than that of JDK1.7, and even higher than 100% in certain size areas. Since the Hash algorithm is relatively uniform, the effect of the red-black tree introduced by JDK1.8 is not obvious. Let's take a look at the uneven Hash.

Hash is extremely uneven

Suppose we have another very bad Key, all instances of which return the same hashCode value. This is the worst case using HashMap. The code is modified as follows:

class Key implements Comparable<Key> {

    //...

    @Override
    public int hashCode() {
        return 1;
    }
}

The main method is still executed, and the results are shown in the following table:

Performance comparison table 2.png

It can be seen from the results in the table that as the size increases, the time spent in JDK1.7 is an increasing trend, while in JDK1.8, there is an obvious decreasing trend, and the logarithmic growth is stable. When a linked list is too long, HashMap will dynamically replace it with a red-black tree, which reduces the time complexity from O(n) to O(logn). The time taken by the hash algorithm to be uniform and non-uniform is obviously different. The relative comparison of these two situations can illustrate the importance of a good hash algorithm.

      Test environment: processor is 2.2 GHz Intel Core i7, memory is 16 GB 1600 MHz DDR3, SSD hard disk, using default JVM parameters, running on 64-bit OS X 10.10.1.

summary

(1) Expansion is a particularly performance-intensive operation, so when programmers use HashMap, they estimate the size of the map and give a rough value during initialization to avoid frequent expansion of the map.

(2) The load factor can be modified and can also be greater than 1, but it is recommended not to modify it easily unless the situation is very special.

(3) HashMap is thread-unsafe. Do not operate HashMap at the same time in a concurrent environment. It is recommended to use ConcurrentHashMap.

(4) JDK1.8 introduced the red-black tree to greatly optimize the performance of HashMap.

(5) If you haven't upgraded JDK1.8 yet, start upgrading now. The performance improvement of HashMap is only the tip of the iceberg of JDK1.8.

refer to

  1. JDK1.7&JDK1.8 source code.
  2. CSDN blog channel, HashMap multi-threaded infinite loop problem , 2014.
  3. Red and Black Alliance, Source Analysis of HashMap (JDK1.8) of Java Class Set Framework , 2015.
  4. CSDN blog channel,  teach you a preliminary understanding of red-black trees , 2010.
  5. Java Code Geeks,HashMap performance improvements in Java 8,2014。
  6. Importnew, Dangerous! Using Mutable Objects as Keys in HashMaps , 2014.
  7. CSDN blog channel, why the number of buckets in the general hashtable will take a prime number , 2013.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325477347&siteId=291194637