Detailed explanation of HashMap in Java

The definition given by Baidu Baike on the hash table.

Hash table (Hash table, also called a hash table), based on the key code value (Key value) to directly access the data structure . In other words, it accesses the record by mapping the key code value to a location in the table to speed up the search. This mapping function is called a hash function , and the array storing records is called a hash table .

Given table M, there is a function f(key). For any given key value key, if the address of the record in the table containing the key can be obtained after substituting the function into the function, then table M is called a hash (Hash) Table, the function f (key) is a hash (Hash) function.

A brief overview:

HashMap is implemented based on the Map interface. The elements are stored in key-value pairs, and null and null values ​​are allowed. Because keys are not allowed to be repeated, only one key can be null. In addition, HashMap cannot guarantee the order of placing elements. It is unordered and cannot be the same as the order in which it is placed. HashMap is not thread safe.

Inherited from AbstractMap

public class HashMap<K,V> extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable {}

Basic attributes

/**
     * The default initial capacity - MUST be a power of two.
   * 默认初始容量-16.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
    /**
     * The load factor used when none specified in constructor.
     * 负载因子默认为0.75.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
/**
     * Holds cached entrySet(). Note that AbstractMap fields are used
     * for keySet() and values().
     * 初始化的默认数组
     */
    transient Set<Map.Entry<K,V>> entrySet;

    /**
     * The number of key-value mappings contained in this map.
     * HashMap中元素的数量
     */
    transient int size;
    /**
     * The load factor for the hash table.
     *  判断是否需要调整HashMap的容量
     * @serial
     */
    final float loadFactor;

The main ones are the above attributes, which are briefly introduced.

Note: HashMap expansion operation is a very time-consuming task, so if you can estimate the capacity of the Map, it is best to give it a default initial value to avoid multiple expansions. The thread of HashMap is not safe, and ConcurrentHashMap is recommended in a multi-threaded environment. (Default size: 16)

First of all, it needs to be emphasized that HashMapthe thread insecurity is reflected in the problems of infinite loops, data loss, and data overwriting. Among them, infinite loop and data loss are problems that occurred in JDK1.7, which have been resolved in JDK1.8, but there will still be problems such as data coverage in 1.8.

Thread unsafe

/**
     * Implements Map.put and related methods.
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

The fourteenth line of code is to determine whether there is a hash collision. Assume that two threads A and B are performing put operations, and the insertion index calculated by the hash function is the same. When thread A finishes executing the sixth line of code, The time slice is exhausted and it is suspended. After thread B gets the time slice, it inserts the element at the subscript and completes the normal insertion. Then thread A obtains the time slice. Since the hash collision judgment has been made before, all this It will not judge again, but directly insert, which causes the data inserted by thread B to be overwritten by thread A, which makes the thread unsafe.

Before that, there is one at line 48 of the code ++size. When we think about it this way, it is thread A and thread B. When these two threads perform the put operation at the same time, assuming the current HashMapzise size is 10, when thread A executes to the first At 48 lines of code, the size value of 10 is obtained from the main memory and the +1 operation is prepared, but because the time slice is exhausted, the CPU has to be released. Thread B happily gets the CPU or gets the value of size from the main memory 10 performs the +1 operation, completes the put operation and writes size=11 back to the main memory, and then thread A gets the CPU again and continues execution (at this time the value of size is still 10), after the put operation is completed, the Size=11 is written back to the memory. At this time, threads A and B both perform a put operation, but the value of size only increases by 1, so thread insecurity is caused by data overwriting.

The initial capacity of HashMap is 16, and the fill factor is 0.75 by default. When HashMap is expanded, the current capacity is doubled: capacity*2. HashMap calculates the hash and performs a secondary hash on the hashcode of the key to obtain a better hash value. Then take a look at the length of the table array.

int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
 
static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }
 
 static int indexFor(int h, int length) {
        return h & (length-1);

 

Guess you like

Origin blog.csdn.net/a159357445566/article/details/108686078