What is the difference between the collection class HashMap, HashTable, and ConcurrentHashMap?

1.HashMap

image

Simply put, HashMap is composed of an array + a linked list. The array is the main body of the HashMap, and the linked list exists mainly to resolve hash conflicts. If the location of the array does not contain the linked list (the current entry's next points to null), then For search, add, etc. operations are very fast, only one addressing is required; if the located array contains a linked list, the time complexity of the add operation is still O(1), because the latest Entry will be inserted into the head of the linked list. You only need to simply change the reference chain, and for the lookup operation, you need to traverse the linked list, and then compare and find one by one through the equals method of the key object. Therefore, for performance considerations, the fewer linked lists in the HashMap, the better the performance.

Hash function (further calculation of the key's hashcode and adjustment of the binary bits to ensure that the final storage location is as evenly distributed as possible)

static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

Find function

The hashmap has been modified in JDK1.8, as shown in the figure below

Before JDK 1.8, the implementation of HashMap was an array + linked list. Even if the hash function is good, it is difficult to achieve a 100% uniform distribution of elements.

When a large number of elements in the HashMap are stored in the same bucket, there is a long linked list under this bucket. At this time, the HashMap is equivalent to a singly linked list. If the singly linked list has n elements, the time complexity of traversal is O(n), completely lost its advantages.

In response to this situation, JDK 1.8 introduced red-black trees (search time complexity is O(logn)) to optimize this problem

2.HashTable

image

Hashtable includes several important member variables: table, count, threshold, loadFactor, modCount.

  • Table is an Entry[] array type, and Entry (explained in HashMap) is actually a singly linked list. The "key-value pairs" of the hash table are all stored in the Entry array.

  • count is the size of the Hashtable, which is the number of key-value pairs stored in the Hashtable.

  • threshold is the threshold of Hashtable, which is used to determine whether the capacity of Hashtable needs to be adjusted. The value of threshold = "capacity * load factor".

  • loadFactor is the load factor.

  • modCount is used to implement the fail-fast mechanism.

put method

The whole process of put method is:

  1. Judge whether value is empty, and throw an exception if it is empty;

  2. Calculate the hash value of the key, and obtain the position index of the key in the table array according to the hash value. If the table[index] element is not empty, iterate. If the same key is encountered, it will be replaced directly and the old value will be returned;

  3. Otherwise, we can insert it into the table[index] position.

public synchronized V put(K key, V value) {
        // Make sure the value is not null确保value不为null
        if (value == null) {
            throw new NullPointerException();
        }

        // Makes sure the key is not already in the hashtable.
        //确保key不在hashtable中
        //首先,通过hash方法计算key的哈希值,并计算得出index值,确定其在table[]中的位置
        //其次,迭代index索引位置的链表,如果该位置处的链表存在相同的key,则替换value,返回旧的value
        Entry tab[] = table;
        int hash = hash(key);
        int index = (hash & 0x7FFFFFFF) % tab.length;
        for (Entry<K,V> e = tab[index] ; e != null ; e = e.next) {
            if ((e.hash == hash) && e.key.equals(key)) {
                V old = e.value;
                e.value = value;
                return old;
            }
        }

        modCount++;
        if (count >= threshold) {
            // Rehash the table if the threshold is exceeded
            //如果超过阀值,就进行rehash操作
            rehash();

            tab = table;
            hash = hash(key);
            index = (hash & 0x7FFFFFFF) % tab.length;
        }

        // Creates the new entry.
        //将值插入,返回的为null
        Entry<K,V> e = tab[index];
        // 创建新的Entry节点,并将新的Entry插入Hashtable的index位置,并设置e为新的Entry的下一个元素
        tab[index] = new Entry<>(hash, key, value, e);
        count++;
        return null;
    }

get 方法

相比较于 put 方法,get 方法则简单很多。其过程就是首先通过 hash()方法求得 key 的哈希值,然后根据 hash 值得到 index 索引(上述两步所用的算法与 put 方法都相同)。然后迭代链表,返回匹配的 key 的对应的 value;找不到则返回 null。

public synchronized V get(Object key) {
        Entry tab[] = table;
        int hash = hash(key);
        int index = (hash & 0x7FFFFFFF) % tab.length;
        for (Entry<K,V> e = tab[index] ; e != null ; e = e.next) {
            if ((e.hash == hash) && e.key.equals(key)) {
                return e.value;
            }
        }
        return null;
    }

3.ConcurrentHashMap

在JDK1.7版本中,ConcurrentHashMap的数据结构是由一个Segment数组和多个HashEntry组成

Segment数组的意义就是将一个大的table分割成多个小的table来进行加锁,也就是上面的提到的锁分离技术,而每一个Segment元素存储的是HashEntry数组+链表,这个和HashMap的数据存储结构一样

put操作

对于ConcurrentHashMap的数据插入,这里要进行两次Hash去定位数据的存储位置

static class Segment<K,V> extends ReentrantLock implements Serializable {

从上Segment的继承体系可以看出,Segment实现了ReentrantLock,也就带有锁的功能,当执行put操作时,会进行第一次key的hash来定位Segment的位置,如果该Segment还没有初始化,即通过CAS操作进行赋值,然后进行第二次hash操作,找到相应的HashEntry的位置,这里会利用继承过来的锁的特性,在将数据插入指定的HashEntry位置时(链表的尾端),会通过继承ReentrantLock的tryLock()方法尝试去获取锁,如果获取成功就直接插入相应的位置,如果已经有线程获取该Segment的锁,那当前线程会以自旋的方式去继续的调用tryLock()方法去获取锁,超过指定次数就挂起,等待唤醒。

get操作

ConcurrentHashMap的get操作跟HashMap类似,只是ConcurrentHashMap第一次需要经过一次hash定位到Segment的位置,然后再hash定位到指定的HashEntry,遍历该HashEntry下的链表进行对比,成功就返回,不成功就返回null。

计算ConcurrentHashMap的元素大小是一个有趣的问题,因为他是并发操作的,就是在你计算size的时候,他还在并发的插入数据,可能会导致你计算出来的size和你实际的size有相差(在你return size的时候,插入了多个数据),要解决这个问题,JDK1.7版本用两种方案。

  1. 第一种方案他会使用不加锁的模式去尝试多次计算ConcurrentHashMap的size,最多三次,比较前后两次计算的结果,结果一致就认为当前没有元素加入,计算的结果是准确的;

  2. 第二种方案是如果第一种方案不符合,他就会给每个Segment加上锁,然后计算ConcurrentHashMap的size返回。

JDK1.8的实现

JDK1.8的实现已经摒弃了Segment的概念,而是直接用Node数组+链表+红黑树的数据结构来实现,并发控制使用Synchronized和CAS来操作,整个看起来就像是优化过且线程安全的HashMap,虽然在JDK1.8中还能看到Segment的数据结构,但是已经简化了属性,只是为了兼容旧版本。

image

 

put操作

在上面的例子中我们新增个人信息会调用put方法,我们来看下。

  1. 如果没有初始化就先调用initTable()方法来进行初始化过程

  2. 如果没有hash冲突就直接CAS插入

  3. 如果还在进行扩容操作就先进行扩容

  4. 如果存在hash冲突,就加锁来保证线程安全,这里有两种情况,一种是链表形式就直接遍历到尾端插入,一种是红黑树就按照红黑树结构插入,

  5. 最后一个如果该链表的数量大于阈值8,就要先转换成黑红树的结构,break再一次进入循环

  6. 如果添加成功就调用addCount()方法统计size,并且检查是否需要扩容

get操作

我们现在要回到开始的例子中,我们对个人信息进行了新增之后,我们要获取所新增的信息,使用String name = map.get(“name”)获取新增的name信息,现在我们依旧用debug的方式来分析下ConcurrentHashMap的获取方法get()

public V get(Object key) {

    Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;

    int h = spread(key.hashCode()); //计算两次hash

    if ((tab = table) != null && (n = tab.length) > 0 &&

        (e = tabAt(tab, (n - 1) & h)) != null) {//读取首节点的Node元素

        if ((eh = e.hash) == h) { //如果该节点就是首节点就返回

            if ((ek = e.key) == key || (ek != null && key.equals(ek)))

                return e.val;

        }

        //hash值为负值表示正在扩容,这个时候查的是ForwardingNode的find方法来定位到nextTable来

        //查找,查找到就返回

        else if (eh < 0)

            return (p = e.find(h, key)) != null ? p.val : null;

        while ((e = e.next) != null) {//既不是首节点也不是ForwardingNode,那就往下遍历

            if (e.hash == h &&

                ((ek = e.key) == key || (ek != null && key.equals(ek))))

                return e.val;

        }

    }

    return null;

}

  1. 计算hash值,定位到该table索引位置,如果是首节点符合就返回

  2. 如果遇到扩容的时候,会调用标志正在扩容节点ForwardingNode的find方法,查找该节点,匹配就返回

  3. 以上都不符合的话,就往下遍历节点,匹配就返回,否则最后就返回null

其实可以看出JDK1.8版本的ConcurrentHashMap的数据结构已经接近HashMap,相对而言,ConcurrentHashMap只是增加了同步的操作来控制并发,从JDK1.7版本的ReentrantLock+Segment+HashEntry,到JDK1.8版本中synchronized+CAS+HashEntry+红黑树,相对而言,总结如下思考:

  1. The implementation of JDK1.8 reduces the granularity of the lock. The granularity of the JDK1.7 version lock is based on the segment and contains multiple HashEntry, while the granularity of the JDK1.8 lock is the HashEntry (first node)

  2. The data structure of the JDK1.8 version has become simpler, making the operation clearer and smoother. Because synchronized has been used for synchronization, there is no need for the concept of segment locks, and there is no need for the data structure of Segment. Due to the granularity Reduced, the complexity of implementation has also increased

  3. JDK1.8 uses red-black trees to optimize linked lists. The traversal based on long-length linked lists is a very long process, and the traversal efficiency of red-black trees is very fast, instead of a certain threshold of linked lists, thus forming an optimal partner

  4. Why does JDK1.8 use the built-in lock synchronized instead of the reentrant lock ReentrantLock? I think there are the following points:

  • Because the granularity is reduced, synchronized is not worse than ReentrantLock in the relatively low-granularity locking method. In coarse-grained locking, ReentrantLock may control the boundaries of each low-granularity through Condition, which is more flexible, and in low-granularity, The advantage of Condition is gone

  • The JVM development team has never given up on synchronization, and there is more room for optimization of synchronized based on JVM. It is more natural to use embedded keywords than to use API.

  • Under a large number of data operations, for the memory pressure of the JVM, API-based ReentrantLock will spend more memory. Although it is not a bottleneck, it is also a basis for selection.

4. Summary

There are several main differences between Hashtable and HashMap: thread safety and speed. Use Hashtable only when you need complete thread safety, and if you use Java 5 or above, use ConcurrentHashMap.


Guess you like

Origin blog.51cto.com/15082402/2644374