HashMap source code analysis

In recent years, HashMap has been a hot spot for interviews in major companies, and a lot of things have been extended from it. Let's explain some small knowledge points:

①Basic information of HashMap

②The difference between HashMap and HashTable

③The underlying implementation of HashMap

④HashMap solves the collision problem

⑤The implementation principle of ConcurrentHashMap

 

1. Basic information of HashMap

An implementation of the Map interface        based on a hash table . This implementation provides all optional map operations and allows null values ​​and null keys. (The HashMap class is roughly the same as Hashtable , except that it is unsynchronized and null is allowed .) This class makes no guarantees about the ordering of the map, and in particular it does not guarantee that that ordering will persist. HashMap is unordered and thread-unsafe.

 

Second, the difference between HashMap and HashTable

       ①HashMap is thread-insecure, and HashTable is thread-safe.

       ②HashMap can accept null values ​​and null keys, but HashTable does not.

       ③ HashMap's iterator is fail-fast iterator, while Hashtable's enumerator iterator is not fail-fast. So when other threads change the structure of HashMap (add or remove elements), ConcurrentModificationException will be thrown, but the remove() method of the iterator itself will not throw ConcurrentModificationException when removing elements. But this is not a guaranteed behavior, it depends on the JVM. This is also the difference between Enumeration and Iterator.

       ④ In a single-threaded environment, HashMap is faster than HashTable, and thread-insecure is faster than thread-safe, because there is a process of locking and unlocking in thread-safety. When a large number of these processes are accumulated into one piece, it is a very impressive time.

       ⑤HashMap cannot guarantee that the order of elements will not change over time.

 

3. The underlying implementation of HashMap:

       Before 1.8, the bottom layer of HashMap was composed of Entry array and linked list. In 1.8, it was composed of Entry array, linked list and red-black tree. Let's talk about the bottom layer implementation before 1.8;



 HashMap uses the Entry array to store, each key-value pair constitutes an Entry entity, the Entry class is actually a one-way linked list structure, it has a Next pointer, which can connect to the next Entry entity, the code is as follows:

 

    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

 In 1.8 and later, the concept of red-black tree was added. Whenever the length of a linked list is greater than 8, it is automatically converted into a red-black tree. The structure is as follows:

 



 HashMap calculates the storage location of Entry in HashMap according to the hash value of the key. When the value is the same, it can exist in different linked lists;

 

4. HashMap's handling of HashCode collisions

       Use the pull method to solve: HashCode is calculated by using the key through the hash function. Due to different keys, the same HashCode may be calculated through this function, so the zipper method is used to solve the conflict, and the same value of HashCode is connected into a linked list, but When getting, according to the key, go to the bucket to find it. If it is in a linked list, it means that there is a conflict. At this time, it is necessary to check whether the key is the same;

       When using the zipper method to solve the problem, when calling the put method of HashMap, the HashCode will be called first to find the relevant key, and when there is a conflict, the equals method will be called;

       When two different keys have the same HashCode, they will be stored in the linked list of the same bucket location, and the equals() of the key object will find the corresponding key-value pair;

 

Five,  the implementation principle of ConcurrentHashMap

        ConcurrentHashMap is a thread-safe and efficient HashMap. Using HashMap in a concurrent program may lead to an infinite loop of the program, while using a thread-safe HashTable is very inefficient.

        HashMap will cause an infinite loop when executing the put operation concurrently, because multithreading will cause the HashMap's Entry linked list to form a ring data structure, and the next node of the Entry will never be empty, and an infinite loop will be generated to obtain the Entry;

       HashTable in a multi-threaded environment, when a thread accesses the synchronization method of HashTable, and other threads also access HashTable, it will enter the blocking or polling state;

       ConcurrentHashMap is composed of Segment array structure and HashEntry array structure. Segment is a reentrant lock that plays the role of lock in ConcurrentHashMap, and HashEntry is used to store key-value pair data:

The structure diagram is shown in the following figure:



 

ConcurrentHashMap uses segment lock Segment to protect data in different segments. Each lock is used to lock a part of the data in the container. When multiple threads access data in different data segments in the container, there will be no lock competition between threads, so it is effective to improve the efficiency of concurrent access. 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326648127&siteId=291194637