Interview Java-Collection of HashMap and ConcurrentHashMap

Preface

Friendly reminder, this article involves a lot of knowledge points and consumes more brain power. If you are afraid of the future can not find the article, it is recommended to collection
if you do not review, can jump directly to the - beginning of the interview
the following code from JDK8

Before the interview, let’s review

HashMap put method

public V put(K key, V value) {
        //这里已经对key进行一次哈希了
        return putVal(hash(key), key, value, false, true);
    }
    //扰动函数,主要功能:降低哈希冲突(详细内容不展开)
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab;//桶数组
    	Node<K,V> p; //节点
    	int n, i;
		      //如果table为空,也就是没初始化,或者已经被初始化了,但是数组长度为0,即不是2的幂次方
        if ((tab = table) == null || (n = tab.length) == 0)
	        	//给tab扩容,分配空间,初始值为n=16
            n = (tab = resize()).length;
		    //如果桶i位置上没有节点
        if ((p = tab[i = (n - 1) & hash]) == null)
			      //那么就直接,创建节点,然后把节点放在桶的i位置上
            tab[i] = newNode(hash, key, value, null);
		    //如果桶i位置上有节点,p是指向节点的引用
        else {
            Node<K,V> e;
            K k;
			      //如果hash相等,且key的内存地址或key的值相等。那就
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
				        //如果p是红黑树上的节点,那就把节点加到红黑树上
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
				        //遍历链表
                for (int binCount = 0; ; ++binCount) {
					          //如果后一个节点为null
                    if ((e = p.next) == null) {
						            //就把新节点放在p的后一个节点
                        p.next = newNode(hash, key, value, null);
						            //如果bincount>=8-1,就是bincount==8时,链表转变红黑色
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
					          //如果该节点的hash,key和准备加的节点相等。在后面会进行替换操作
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
			      //替换值
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
				        //如果onlyIfAbsent为true,就不改变value的值
                if (!onlyIfAbsent || oldValue == null)
					          //改变value值
                    e.value = value;
                //留给LinkedHashMap的空方法
                afterNodeAccess(e);
				        //返回oldValue
                return oldValue;
            }
        }
        ++modCount;
		// map中的元素数量大于threshold时,就扩容
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

To sum up, the general flow of the put method is as follows:

  1. Hash the key to obtain the hash value
  2. Take the remainder of the hash value and the length of the array, and its value is the index value of the key
  3. If the index position of the array is null, just insert it directly
  4. If the index position of the array has a value, there are three cases:
  • Case 1: If the key of the node is equal to the key to be inserted, just replace the value directly
  • Case 2: If the node belongs to the node of the red-black tree, just follow the update or insert method of the red-black tree.
  • Case 3: If the node belongs to the node of the linked list, traverse the linked list and find the corresponding node, just replace the value; if the corresponding node cannot be found, insert a new node at the tail of the linked list.

This is just a general description. For specific implementation details, just look at the source code directly.

The get method of HashMap

public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }
    //扰动函数,和put方法中使用的是同一个hash方法
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab;
        Node<K,V> first, e;
        int n;
        K k;
        //如果数组不为空,且数组的长度大于0,且头结点不为空;否则直接返回null
        if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) {
            //  如果头结点的哈希值和key的哈希值相等,且key的地址或key的内容相等;就直接返回头结点
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
                //头结点不相等,如果有下一个节点,就遍历;如果没下一个节点,就返回null
            if ((e = first.next) != null) {
                //如果节点是红黑树,那么就按照红黑树的方式来获取节点,并返回
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    //如果是链表,就遍历哈希值,key的地址或Key的内容,有符合条件的就立即返回,如果没,那么继续遍历下一个节点。如果遍历全部节点后,都没,那就返回null
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }
    

To sum up, the general process of the get method of HashMap is as follows:

  1. Get the hash value according to the key
  2. Take the remainder according to the hash value and the length of the array to obtain the index value
  3. According to the index value, get the corresponding node, and then compare the key and hash of the node
  4. If they are equal, return the corresponding node; if they are not equal, continue to traverse; if there is no traversal to the end, return null

This is just a general description. For specific implementation details, just look at the source code directly.

The put method of ConcurrentHashMap

public V put(K key, V value) {
        return putVal(key, value, false);
    }

    final V putVal(K key, V value, boolean onlyIfAbsent) {
        //判空:key、value均不能为null
        if (key == null || value == null) throw new NullPointerException();
        //计算出hash值
        int hash = spread(key.hashCode());
        int binCount = 0;
        //遍历table
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            // table为null,进行初始化工作
            if (tab == null || (n = tab.length) == 0)
                tab = initTable();
            //如果i位置没有节点,则直接插入,不需要加锁
            else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
                //CAS
                if (casTabAt(tab, i, null,
                        new Node<K,V>(hash, key, value, null)))
                    break;                   // no lock when adding to empty bin
            }
            // 如果有线程正在进行扩容操作,则先帮助扩容
            else if ((fh = f.hash) == MOVED)
                tab = helpTransfer(tab, f);
            else {
                V oldVal = null;
                //对该节点进行加锁处理(hash值相同的链表的头节点),对性能有点儿影响
                //特别注意一下这个f,这个f是头结点,锁的粒度是节点
                synchronized (f) {
                    if (tabAt(tab, i) == f) {
                        //fh > 0 表示为链表,将该节点插入到链表尾部
                        if (fh >= 0) {
                            binCount = 1;
                            for (Node<K,V> e = f;; ++binCount) {
                                K ek;
                                //hash 和 key 都一样,替换value
                                if (e.hash == hash &&
                                        ((ek = e.key) == key ||
                                                (ek != null && key.equals(ek)))) {
                                    oldVal = e.val;
                                    //putIfAbsent()
                                    if (!onlyIfAbsent)
                                        e.val = value;
                                    break;
                                }
                                Node<K,V> pred = e;
                                //链表尾部  直接插入
                                if ((e = e.next) == null) {
                                    pred.next = new Node<K,V>(hash, key,
                                            value, null);
                                    break;
                                }
                            }
                        }
                        //树节点,按照树的插入操作进行插入
                        else if (f instanceof TreeBin) {
                            Node<K,V> p;
                            binCount = 2;
                            if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                    value)) != null) {
                                oldVal = p.val;
                                if (!onlyIfAbsent)
                                    p.val = value;
                            }
                        }
                    }
                }
                if (binCount != 0) {
                    // 如果链表长度已经达到临界值8 就需要把链表转换为树结构
                    if (binCount >= TREEIFY_THRESHOLD)
                        treeifyBin(tab, i);
                    if (oldVal != null)
                        return oldVal;
                    break;
                }
            }
        }

        //size + 1
        addCount(1L, binCount);
        return null;
    }

To sum up, the general flow of the put method of ConcurrentHashMap is as follows:

  1. First of all, it is judged to be null, and neither the key nor the value is allowed to be null. (See Supplementary Knowledge 1)
  2. Then calculate the hash value. (See Supplementary Knowledge 4) Talk about:
  3. Then traverse the table and perform the node insertion operation. The specific process is as follows:
  • If the table is empty, it means that the ConcurrentHashMap has not been initialized, and the initialization operation is performed: initTable()
  • Obtain the position i of the node according to the hash value. If the position is empty, insert it directly. This process does not need to be locked. Calculate f position: i=(n-1) & hash. (See Supplementary Knowledge 2)
  • If it is detected that fh = f.hash == -1, then f is the ForwardingNode node, indicating that other threads are performing expansion operations, and help the threads to perform expansion operations together. (See Supplementary Knowledge 3)
  • If f.hash >= 0 means it is a linked list structure, then traverse the linked list, if there is the current key node, replace the value, otherwise insert it to the end of the linked list. If f is a TreeBin type node, update or add nodes according to the red-black tree method
  • If the length of the linked list> TREEIFY_THRESHOLD (default is 8), the linked list is converted to a red-black tree structure

The get method of ConcurrentHashMap

public V get(Object key) {
    Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
    // 先计算hash
    int h = spread(key.hashCode());
    //如果数组不为空,且长度大于0,且节点不为空
    if ((tab = table) != null && (n = tab.length) > 0 &&
            (e = tabAt(tab, (n - 1) & h)) != null) {
        // 搜索到的节点key与传入的key相同且不为null,直接返回这个节点
        if ((eh = e.hash) == h) {
            if ((ek = e.key) == key || (ek != null && key.equals(ek)))
                return e.val;
        }
        // 树
        else if (eh < 0)
            return (p = e.find(h, key)) != null ? p.val : null;
        // 链表,遍历
        while ((e = e.next) != null) {
            if (e.hash == h &&
                    ((ek = e.key) == key || (ek != null && key.equals(ek))))
                return e.val;
        }
    }
    return null;
}

To sum up, the general flow of the get method of ConcurrentHashMap is as follows:

  1. First calculate the hash value h
  2. Get the index value by (n-1) & h)
  3. If the match is the head node, return the corresponding value directly
  4. If it is a tree, return value according to the read operation of the red-black tree
  5. If it is a linked list, perform matching, traversal, and obtain the corresponding value

Supplementary knowledge

  • 1. HashMap allows both key and value to be null.
  • 2. This process uses CAS, which can be understood as having a lock, spin lock. It can also be understood as lock-free, because CAS is a hardware instruction, unlike an operating system level lock like Synchronized. Therefore, whether there is a lock, the benevolent sees the benevolent and the wise sees the wisdom. Just understand it yourself.
  • 3. ForwardingNode: A special Node node with a hash value of -1, which stores a reference to nextTable. Only when the table expands, the ForwardingNode will play a role, as a placeholder placed in the table to indicate that the current node is null or has been moved.
  • 4. Compared with the hash function of HashMap, this has a shift operation, only slightly different, but the purpose is to reduce hash conflicts.

Well, I believe you are already super invincible and tired after reading the get and put methods of HashMap and ConcurrentHashMap. Because I also write so tired and hurt. Now you can give me a compliment to comfort my hard work; it is best to collect it first so that you can review it later. If you are tired, order one and watch it, so that you can drink some water, and watch it when you come back after a meal. If you think the writing is good, just share it and forward it to your friends.

The advertisement is over! ! ! I continue! ! ! The interview officially begins! ! !

Interview begins

Interviewer: Can you talk about the process of the put method of HashMap?

  • Barabara is a bunch, just refer to the general process written above, I will not repeat it.

Interviewer: Under what conditions will the linked list in HashMap become a tree?

  • Two conditions must be met, one is that the length of the linked list is >= 8 and the capacity of the HashMap must be >= 64 to make the linked list become a tree. If the length of the linked list>=8, but the capacity of the HashMap<64, then the expansion operation will be performed. After the expansion operation, the length of the linked list will be shortened accordingly. (The relevant source code is as follows:)
final void treeifyBin(Node<K,V>[] tab, int hash) {
        int n, index; Node<K,V> e;
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
            resize();
            
            ...省略一堆代码

Interviewer: Why does the variable tree of the linked list in HashMap have 8 nodes?

  • You tell the interviewer directly that this is a statistical question. Officially tested, when there are more than 8 nodes, the probability of conflict is less than one in ten million. (In order to be aware, I took a comment from the source code)
    * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million //小于千万分之一

Interviewer: What is the difference between HashMap's JDK7 and JDK8?

  • When a hash conflict occurs in JDK7, the linked list will become longer and longer, and the time complexity will become O(N); on the contrary, when a hash conflict occurs in JDK 8, the linked list will become a red-black tree under certain conditions, and the time complexity will be Will become O(LogN);
  • JDK7 is in a high-concurrency environment, because the thread is not safe, when the put method is operated, the linked list will be looped due to the expansion resize method and the header insertion method, which will cause the cpu100% security problem in the get method. Although JDK 8 is not thread-safe, changing the head insertion method to the tail insertion method will no longer make the linked list into a ring when resizing.
  • To sum up, the more important difference is the introduction of red-black trees. The interviewer may ask you why red-black trees are introduced instead of other trees, such as searching for binary trees. In fact, it is mainly to investigate your understanding of data structure. Before the interview, it is best to understand the time complexity, advantages and disadvantages of reading and writing the balanced trees, red-black trees, and search binary trees.

Interviewer: What is the difference between SynchronizedMap and ConcurrentHashMap?

  • SynchronizedMap
    locks the entire table at a time to ensure thread safety, so only one thread can visit the map at a time.
  • ConcurrentHashMap
    uses CAS+Synchronized to ensure thread safety. Relative to SynchronizedMap. ConcurrentHashMap locks the node, and SynchronizedMap locks the entire table. It can be compared to MySQL row locks and table locks. The granularity of the ConcurrentHashMap lock is smaller.
    In addition, ConcurrentHashMap uses a different iterative method. In this iterative method, when the collection changes after the iterator is created, ConcurrentModificationException will no longer be thrown. Instead, the new data will not affect the original data when the iterator is changed. The head pointer will be changed after the iterator is completed. Replace with new data, so that the iterator thread can use the old data, and the writer thread can also complete the change concurrently.

Interviewer: Why does ConcurrentHashMap not need to be locked for reading?

  • Regarding this point, it is actually necessary to compare ConcurrentHashMap of JDK7 and JDK 8.

  • In JDK7 and before

    • The key, hash, and next in HashEntry are all final, and only the header can be inserted into the node or deleted.
    • The value in HashEntry is volatile.
    • It is not allowed to use null as the key and value. When the reader thread reads the value of a HashEntry's value field as null, it knows that a conflict has occurred-a reordering phenomenon has occurred (put method sets the bytecode instruction of the new value object) Reordering), you need to re-read the value after locking.
    • The volatile variable count coordinates the memory visibility between the read and write threads. The count is modified after the write operation, and the read operation reads the count first. According to the happen-before transitivity principle, the modified read operation of the write operation can be seen.
  • In JDK8

    • Both val and next of Node are of volatile type.
    • The Unsafe operation corresponding to the tabAt() method and casTabAt() method implements volatile semantics, so that instruction reordering can be prohibited, and there is no need to worry about reading the Null value.
    static class Node<K,V> implements Map.Entry<K,V> {
          final int hash;
          final K key;
          volatile V val; 
          volatile Node<K,V> next;
    
          Node(int hash, K key, V val, Node<K,V> next) {
              this.hash = hash;
              this.key = key;
              this.val = val;
              this.next = next;
          }
    

    Interviewer: The interview is over, congratulations on entering the next round of interview

to sum up

In fact, there are many interview questions about Java collections-HashMap and ConcurrentHashMap, which will not be expanded here.

  • Is the iterator of ConcurrentHashMap strong or weak? What about HashMap?
  • When will HashMap start to expand? How to expand?
  • The difference between ConcurrentHashMap 7 and 8?

Talk

Thank you very much for seeing this. If you think the article is well written, please pay attention to it and share it (it is very, very useful for me).
If you think the article needs to be improved, I look forward to your suggestions for me, please leave a message.
If you want to see something, I look forward to your message.
Your support and support is the greatest motivation for my creation!

Reference

  • Taro source code
  • Brother Xiao Ming-JUC's Java Concurrent Container

Guess you like

Origin blog.csdn.net/Aaron_Tang_/article/details/114703724