A new colleague: Why can't HashMap be traversed and deleted at the same time? It confused me all of a sudden!

Source: juejin.cn/post/7114669787870920734

foreach loop?
HashMap traverses the collection and removes, puts, and adds the collection elements
- 1. Phenomenon
- 2. Study the underlying principles in detail

Some time ago, when a colleague was scanning the code with KW, this article appeared:

picture

The reason for the above is that when using foreach to traverse the HashMap, there will be problems with the put assignment operation at the same time, and the exception ConcurrentModificationException will occur.

So Bangtong took a brief look and got the impression that collection classes need to be cautious when deleting or adding operations at the same time while traversing, and generally use iterators for operations.

So I told my colleagues that iterator should be used to operate on the collection elements. My colleagues asked me why? This suddenly confused me? Yes, I just remembered that it is not allowed to use it this way, but it seems that I have never looked into the reason why?

So today I decided to study this HashMap traversal operation carefully to prevent pitfalls!

foreach loop?

Java foreach syntax is a new feature added in JDK 1.5. It is mainly used as an enhancement of for syntax. So how is its underlying implementation implemented? Let’s take a closer look:

Within the foreach syntax, collections are implemented using iterators, and arrays are implemented using subscript traversal. Java 5 and above compilers hide the internal implementation of iteration and array subscript traversal.

Note: What we are talking about here is that the "Java compiler" or the Java language hides its implementation, not that a certain piece of Java code hides its implementation. That is to say, we cannot find it in any piece of JDK's Java code. The implementation is hidden here. The implementation here is hidden in the Java compiler. Look at the bytecode compiled into a foreach Java code and guess how it is implemented.

Let's write an example to study it:

public class HashMapIteratorDemo {
    String[] arr = {
        "aa",
        "bb",
        "cc"
    };

    public void test1() {
        for (String str: arr) {}
    }
}

Convert the above example into bytecode and decompile it (main function part):

picture

Maybe we don't know exactly what these instructions do, but we can compare the bytecode instructions generated by the following code:

public class HashMapIteratorDemo2 {
    String[] arr = {
        "aa",
        "bb",
        "cc"
    };

    public void test1() {
        for (int i = 0; i < arr.length; i++) {
            String str = arr[i];
        }
    }
}

picture

Take a look at the two bytecode files. Do you find that the instructions are almost the same? If you still have questions, let’s look at the foreach operation on the collection:

Iterate through the collection with foreach:

public class HashMapIteratorDemo3 {
    List < Integer > list = new ArrayList < > ();

    public void test1() {
        list.add(1);
        list.add(2);
        list.add(3);

        for (Integer
            var: list) {}
    }
}

Traverse the collection via Iterator:

public class HashMapIteratorDemo4 {
    List < Integer > list = new ArrayList < > ();

    public void test1() {
        list.add(1);
        list.add(2);
        list.add(3);

        Iterator < Integer > it = list.iterator();
        while (it.hasNext()) {
            Integer
            var = it.next();
        }
    }
}

Compare the bytecodes of the two methods as follows:

picture

We found that the bytecode instruction operations of the two methods are almost identical;

From this we can draw the following conclusions:

For collections, since collections all implement Iterator, the foreach syntax is eventually converted by the compiler into a call to Iterator.next();

For arrays, it is converted into a circular reference to each element in the array.

HashMap traverses the collection and removes, puts, and adds the collection elements

1. Phenomenon

Based on the above analysis, we know that the bottom layer of HashMap implements the Iterator, so in theory we can also use iterators to traverse, which is true, for example, as follows:

public class HashMapIteratorDemo5 {
    public static void main(String[] args) {
        Map < Integer, String > map = new HashMap < > ();
        map.put(1, "aa");
        map.put(2, "bb");
        map.put(3, "cc");

        for (Map.Entry < Integer, String > entry: map.entrySet()) {
            int k = entry.getKey();
            String v = entry.getValue();
            System.out.println(k + " = " + v);
        }
    }
}

Output:

picture

OK, there is no problem with traversal, but what about operating collection elements remove, put, and add?

public class HashMapIteratorDemo5 {
    public static void main(String[] args) {
        Map < Integer, String > map = new HashMap < > ();
        map.put(1, "aa");
        map.put(2, "bb");
        map.put(3, "cc");

        for (Map.Entry < Integer, String > entry: map.entrySet()) {
            int k = entry.getKey();
            if (k == 1) {
                map.put(1, "AA");
            }
            String v = entry.getValue();
            System.out.println(k + " = " + v);
        }
    }
}

Results of the:

There is no problem with execution, and the put operation is successful.

but! but! but! Here comes the problem! ! !

We know that HashMap is a thread-unsafe collection class. If you use foreach to traverse, add and remove operations will cause java.util.ConcurrentModificationException exceptions. The put operation may throw this exception. (Why it is said to be possible, we will explain this later)

Why is this exception thrown?

Let's first take a look at the Java API documentation's explanation of HasMap operations.

picture

The rough translation means: This method returns a collection view of the keys contained in this map.

Collections are backed by maps, and if the map is modified while iterating over the collection (other than by the iterator's own removal operations), the result of the iteration is undefined. Collections support element removal through the Iterator.remove, set.remove, removeAll, retainal and clear operations to remove corresponding mappings from the map. To put it simply, when traversing a collection through map.entrySet(), you cannot perform operations such as remove and add on the collection itself. You need to use an iterator for operations.

For the put operation, if the replacement operation modifies the first element as in the above example, no exception will be thrown. However, if the put operation is used to add elements, an exception will definitely be thrown. Let’s modify the above example:

public class HashMapIteratorDemo5 {
    public static void main(String[] args) {
        Map < Integer, String > map = new HashMap < > ();
        map.put(1, "aa");
        map.put(2, "bb");
        map.put(3, "cc");

        for (Map.Entry < Integer, String > entry: map.entrySet()) {
            int k = entry.getKey();
            if (k == 1) {
                map.put(4, "AA");
            }
            String v = entry.getValue();
            System.out.println(k + " = " + v);
        }
    }
}

An exception occurred during execution:

picture

This verifies that the put operation mentioned above may throw a java.util.ConcurrentModificationException exception.

But I have questions. We said above that the foreach loop is traversal through iterators? Why is it not possible to come here?

This is actually very simple. The reason is that the bottom layer of our traversal operation is indeed performed through iterators, but our remove and other operations are performed by directly operating the map, as in the above example: map.put(4, "AA"); // The actual operations here are performed directly on the collection, rather than through iterators. So there will still be ConcurrentModificationException exception problems.

2. Study the underlying principles in detail

Let's take a look at the source code of HashMap. Through the source code, we find that this method will be used when traversing the collection using Iterator:

final Node < K, V > nextNode() {
    Node < K, V > [] t;
    Node < K, V > e = next;
    if (modCount != expectedModCount)
        throw new ConcurrentModificationException();
    if (e == null)
        throw new NoSuchElementException();
    if ((next = (current = e).next) == null && (t = table) != null) {
        do {} while (index < t.length && (next = t[index++]) == null);
    }
    return e;
}

Here modCount represents how many times the elements in the map have been modified (this value will increase automatically when removing or adding new elements), and expectedModCount represents the expected number of modifications. These two values are equal when the iterator is constructed. , if the two values are out of sync during the traversal process, a ConcurrentModificationException exception will be thrown.

Now let's look at the collection remove operation:

(1) The remove implementation of HashMap itself:

picture

public V remove(Object key) {
    Node < K, V > e;
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}

(2) Remove implementation of HashMap.KeySet

public final boolean remove(Object key) {
    return removeNode(hash(key), key, null, false, true) != null;
}

(3) Remove implementation of HashMap.EntrySet

public final boolean remove(Object o) {
    if (o instanceof Map.Entry) {
        Map.Entry << ? , ? > e = (Map.Entry << ? , ? > ) o;
        Object key = e.getKey();
        Object value = e.getValue();
        return removeNode(hash(key), key, value, true, true) != null;
    }
    return false;
}

(4) Implementation of the remove method of HashMap.HashIterator

public final void remove() {
    Node < K, V > p = current;
    if (p == null)
        throw new IllegalStateException();
    if (modCount != expectedModCount)
        throw new ConcurrentModificationException();
    current = null;
    K key = p.key;
    removeNode(hash(key), key, null, false, false);
    expectedModCount = modCount; //--这里将expectedModCount 与modCount进行同步
}

The above four methods all implement the key deletion operation by calling the HashMap.removeNode method. As long as the key is removed in the removeNode method, modCount will perform an auto-increment operation. At this time, modCount will be inconsistent with expectedModCount;

final Node < K, V > removeNode(int hash, Object key, Object value,
    boolean matchValue, boolean movable) {
    Node < K, V > [] tab;
    Node < K, V > p;
    int n, index;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        ...
        if (node != null && (!matchValue || (v = node.value) == value ||
                (value != null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode < K, V > ) node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount; //----这里对modCount进行了自增，可能会导致后面与expectedModCount不一致
            --size;
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}

Among the above three remove implementations, only the remove method of the third iterator synchronizes the expectedModCount value after calling the removeNode method. The value of expectedModCount is the same as modCount, so when traversing the next element and calling the nextNode method, the iterator method will not throw an exception.

When you get here, do you feel like you suddenly understand?

Therefore, if you need to perform element operations when traversing a collection, you need to use an Iterator, as follows:

public class HashMapIteratorDemo5 {
    public static void main(String[] args) {
        Map < Integer, String > map = new HashMap < > ();
        map.put(1, "aa");
        map.put(2, "bb");
        map.put(3, "cc");

        Iterator < Map.Entry < Integer, String >> it = map.entrySet().iterator();
        while (it.hasNext()) {
            Map.Entry < Integer, String > entry = it.next();
            int key = entry.getKey();
            if (key == 1) {
                it.remove();
            }
        }
    }
}