HashMap data structure and algorithm

table of Contents

Features of HashMap

HashMap before JDK1.8 (linked list solves hash collision-conflict)

After HashMap JDK1.8 (red-black tree-binary tree to solve hash collision)

Why does JDK1.8 not use the AVL tree but the red-black tree?


Features of HashMap

Array [Entry], linked list [Entry[]], red-black tree (jdk1.8 triggers when the length of the linked list is greater than 8) 

  • Fast storage (put)
  • Fast search (time complexity O(1))
  • Scalable (loadFactor=0.75, default size=16, 2 times expansion: 16, 16*0.75->32, 32*0.75->64, 64*0.75->128.....)
  • Non-thread safe (put, remove, etc. are all common methods, and HashTable methods are handled by the synchronized keyword)
 /* ---------------- Public operations -------------- */

    /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

    /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and the default load factor (0.75).
     *
     * @param  initialCapacity the initial capacity.
     * @throws IllegalArgumentException if the initial capacity is negative.
     */
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

    /**
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

    /**
     * Constructs a new <tt>HashMap</tt> with the same mappings as the
     * specified <tt>Map</tt>.  The <tt>HashMap</tt> is created with
     * default load factor (0.75) and an initial capacity sufficient to
     * hold the mappings in the specified <tt>Map</tt>.
     *
     * @param   m the map whose mappings are to be placed in this map
     * @throws  NullPointerException if the specified map is null
     */
    public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
    }

The most classic problem of HashMap -> Hash conflict:

Hash conflict: the array subscripts calculated by different objects are the same (the concept of bucket slot)

Singly linked list: a solution for solving Hash conflicts, adding a next to record the next node

 

HashMap before JDK1.8 (linked list solves hash collision-conflict)

public V put(K key, V value) {  
        if (key == null)  
            return putForNullKey(value);  
        int hash = hash(key.hashCode());  
        int i = indexFor(hash, table.length);  
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {  
            Object k;  
            //判断当前确定的索引位置是否存在相同hashcode和相同key的元素,如果存在相同的hashcode和相同的key的元素,那么新值覆盖原来的旧值,并返回旧值。  
            //如果存在相同的hashcode,那么他们确定的索引位置就相同,这时判断他们的key是否相同,如果不相同,这时就是产生了hash冲突。  
            //Hash冲突后,那么HashMap的单个bucket里存储的不是一个 Entry,而是一个 Entry 链。  
            //系统只能必须按顺序遍历每个 Entry,直到找到想搜索的 Entry 为止——如果恰好要搜索的 Entry 位于该 Entry 链的最末端(该 Entry 是最早放入该 bucket 中),  
            //那系统必须循环到最后才能找到该元素。  
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {  
                V oldValue = e.value;  
                e.value = value;  
                return oldValue;  
            }  
        }  
        modCount++;  
        addEntry(hash, key, value, i);  
        return null;  
    }  

After HashMap JDK1.8 (red-black tree-binary tree to solve hash collision)

 /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    /**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

Why does JDK1.8 not use the AVL tree but the red-black tree?

The AVL tree is essentially a binary search tree, and its characteristics are: 

  • 1. First of all, it is a binary search tree.
  • 2. With balance condition: the absolute value (balance factor) of the difference between the height of the left and right subtrees of each node is at most 1.
  • Essentially a binary search tree with balance function (binary sort tree, binary search tree)

Red-black trees and AVL trees are the most commonly used balanced binary search trees , and their search, deletion, and modification are all O(lgn) time

There are several comparisons and differences between AVL trees and red-black trees:
(1) AVL trees are more rigorously balanced, so they can provide faster search speeds. Generally, AVL trees are suitable for reading and searching intensive tasks.
(2) The red-black tree is more suitable for inserting and modifying intensive tasks.
(3) Generally, the rotation of the AVL tree is more difficult to balance and debug than the rotation of the red-black tree.

Summary :
(1) AVL and red-black trees are highly balanced tree data structures. They are very similar, the real difference lies in the number of rotation operations completed during any add/delete operation.
(2) Both implementations are scaled to a O(lg N), where N is the number of leaves, but in fact AVL trees are faster in search-intensive tasks: with better balance, the average tree traversal is shorter. On the other hand, in terms of insertion and deletion, the AVL tree is slower: it requires a higher number of rotations to correctly rebalance the data structure when modifying it.
(3) In an AVL tree, the difference between the shortest path and the longest path from the root to any leaf is at most 1. In the red-black tree, the difference can be 2 times.
(4) Both are O(log n) lookups, but balancing AVL trees may require O(log n) rotations, while red-black trees will require up to two rotations to reach equilibrium (although O(log n) may need to be checked ) The node determines the position of the rotation). The rotation itself is an O(1) operation, because you are just moving the pointer.


Reference: https://blog.csdn.net/21aspnet/article/details/88939297

Guess you like

Origin blog.csdn.net/boonya/article/details/109731265