[HashMap class application and source code analysis under JDK8] Data structure, hash collision, linked list becomes red and black tree

Series Article Directory

[Java Basics] StringBuffer and StringBuilder class application and source code analysis
[Java Basics] Array application and source code analysis
[Java Basics] String, memory address analysis, source code
[HashMap class application and source code analysis under JDK8 environment] The first empty constructor Initialization
[HashMap class application and source code analysis under JDK8 environment] The second part looks at the source code to understand the expansion mechanism of HashMap
[HashMap class application and source code analysis under JDK8 environment] The third part modifies capacity experiment
[HashMap class application and source code analysis under JDK8 environment] Source code analysis] Part IV HashMap hash collision, HashMap storage structure, linked list becomes red and black tree

insert image description here



1. The data structure of HashMap under JDK8

HashMap is a data structure based on arrays and linked lists (or red-black trees), which maps keys to a position of the array through a hash function, and stores a node of a key-value pair at that position.
Before inserting data, the put method of HashMap must first calculate the hash value (hash(key)) and index of the key, and then insert or update the node at the corresponding position. If the number of nodes exceeds the threshold (threshold), it will expand the capacity ( resize()) or treeization.
The get method of HashMap is mainly to find the corresponding position according to the hash value and index of the key, and then traverse the linked list or red-black tree to return the matching value.
insert image description here
insert image description here

1.1、hash

public class HashMap {
    
    
    static final int hash(Object key) {
    
    
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
}

public class Object {
    
    
    public native int hashCode();
}

public final class System {
    
    
    /**
     * Returns the same hash code for the given object as
     * would be returned by the default method hashCode(),
     * whether or not the given object's class overrides
     * hashCode().
     * The hash code for the null reference is zero.
     *
     * @param x object for which the hashCode is to be calculated
     * @return  the hashCode
     * @since   JDK1.1
     */
    public static native int identityHashCode(Object x);

}

The following is quoted from: In-depth analysis of the specific implementation of Java objects and classes inside the HotSpot VM

object hash

There is a hash code field in _mark, which represents the hash value of the object. Each Java object has its own hash value. If the Object.hashCode() method is not rewritten, the virtual machine will automatically generate a hash value for it. The strategy for hash value generation is shown in Listing 3-4:

Code Listing 3-4 Object hash value generation strategy

static inline intptr_t get_next_hash(Thread * Self, oop obj) { intptr_t value = 0; if (hashCode == 0) { // Park-Miller random number generator value = os::random(); } else if (hashCode = = 1) { // Generate stwRandom every STW to do random intptr_t addrBits = cast_from_oop The Java layer calls Object.hashCode() or System.identityHashCode(), and finally calls get_next_hash() of the runtime/synchronizer of the virtual machine layer to generate a hash value.



1.2, key, value type

	static final int TREEIFY_THRESHOLD = 8;   //链表转红黑树
	
	final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    
    
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        //判断table是否初始化
        if ((tab = table) == null || (n = tab.length) == 0)
            //如果是,调用 resize() 方法,进行初始化并赋值
            n = (tab = resize()).length;
        //通过hash获取下标,如果数据为null
        if ((p = tab[i = (n - 1) & hash]) == null)
            // tab[i]下标没有值,创建新的Node并赋值
            tab[i] = newNode(hash, key, value, null);
        else {
    
    
             //tab[i] 下标的有数据,发生碰撞
            Node<K,V> e; K k;
            //判断tab[i]的hash值和传入的hash值相同,tab[i]的的key值和传入的key值相同
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                //如果是key值相同直接替换即可
                e = p;
            else if (p instanceof TreeNode)//判断数据结构为红黑树
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
    
    //数据结构是链表
                for (int binCount = 0; ; ++binCount) {
    
    
                
                    //p的下一个节点为null,表示p就是最后一个节点
                    if ((e = p.next) == null) {
    
    
                        //创建新的Node节点并插入链表的尾部
                        p.next = newNode(hash, key, value, null);
                        //当元素>=8-1,链表转为树(红黑树)结构
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    //如果key在链表中已经存在,则退出循环
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    //更新p指向下一个节点,继续遍历
                    p = e;
                }
            }
            //如果key在链表中已经存在,则修改其原先key的value值,并且返回老的value值
            if (e != null) {
    
    
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);//替换旧值时会调用的方法(默认实现为空)
                return oldValue;
            }
        }
        ++modCount;//修改次数
        //根据map值判断是否要对map的大小扩容
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);//插入成功时会调用的方法(默认实现为空)
        return null;
    }

View the putVal source code, the key and vlaue data types use generics, and any reference type is fine (Java basic types are not allowed, because basic data types cannot call their hashcode() method and equals() method for comparison, so the key of the HashMap collection It can only be a reference data type, not a basic data type, you can use the wrapper class of the basic data type, such as Integer, Double, Long, Float, etc.).

1.3、Node

See the [1.2] code section, the type of the tab variable is Node, which implements the Map.Entry interface.
Node has attributes such as hash, key, and value, and there is also a next node variable (linked list) for the next node.
Node implements methods such as toString, hashCode, and equals;

static class Node<K,V> implements Map.Entry<K,V> {
    
    
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
    
    
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        {
    
     return key; }
    public final V getValue()      {
    
     return value; }
    public final String toString() {
    
     return key + "=" + value; }

    public final int hashCode() {
    
    
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
    
    
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
    
    
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
    
    
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

interface Entry<K,V> {
    
    
   K getKey();
    
   V getValue();

   V setValue(V value);

   boolean equals(Object o);

   int hashCode();

   public static <K extends Comparable<? super K>, V> Comparator<Map.Entry<K,V>> comparingByKey() {
    
    
       return (Comparator<Map.Entry<K, V>> & Serializable)
           (c1, c2) -> c1.getKey().compareTo(c2.getKey());
   }

   public static <K, V extends Comparable<? super V>> Comparator<Map.Entry<K,V>> comparingByValue() {
    
    
       return (Comparator<Map.Entry<K, V>> & Serializable)
           (c1, c2) -> c1.getValue().compareTo(c2.getValue());
   }
   
   public static <K, V> Comparator<Map.Entry<K, V>> comparingByKey(Comparator<? super K> cmp) {
    
    
       Objects.requireNonNull(cmp);
       return (Comparator<Map.Entry<K, V>> & Serializable)
           (c1, c2) -> cmp.compare(c1.getKey(), c2.getKey());
   }

   public static <K, V> Comparator<Map.Entry<K, V>> comparingByValue(Comparator<? super V> cmp){
    
    
       Objects.requireNonNull(cmp);
       return (Comparator<Map.Entry<K, V>> & Serializable)
           (c1, c2) -> cmp.compare(c1.getValue(), c2.getValue());
   }
}

1.4、TreeNode

See [1.2] code part, the type of variable p is TreeNode, which implements the LinkedHashMap.Entry interface.
TreeNode has attributes such as red, and variables such as parent, left, right, and prev (red-black tree).
TreeNode implements treeify, find, putTreeVal, etc. method

static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    
    
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;
        TreeNode(int hash, K key, V val, Node<K,V> next) {
    
    
            super(hash, key, val, next);
        }

        final TreeNode<K,V> root() {
    
    
            for (TreeNode<K,V> r = this, p;;) {
    
    
                if ((p = r.parent) == null)
                    return r;
                r = p;
            }
        }

        static <K,V> void moveRootToFront(Node<K,V>[] tab, TreeNode<K,V> root) {
    
    
            ...
        }

        final TreeNode<K,V> find(int h, Object k, Class<?> kc) {
    
    
            TreeNode<K,V> p = this;
            do {
    
    
                int ph, dir; K pk;
                TreeNode<K,V> pl = p.left, pr = p.right, q;
                if ((ph = p.hash) > h)
                    p = pl;
                else if (ph < h)
                    p = pr;
                else if ((pk = p.key) == k || (k != null && k.equals(pk)))
                    return p;
                else if (pl == null)
                    p = pr;
                else if (pr == null)
                    p = pl;
                else if ((kc != null ||
                          (kc = comparableClassFor(k)) != null) &&
                         (dir = compareComparables(kc, k, pk)) != 0)
                    p = (dir < 0) ? pl : pr;
                else if ((q = pr.find(h, k, kc)) != null)
                    return q;
                else
                    p = pl;
            } while (p != null);
            return null;
        }

        final TreeNode<K,V> getTreeNode(int h, Object k) {
    
    
            return ((parent != null) ? root() : this).find(h, k, null);
        }

        static int tieBreakOrder(Object a, Object b) {
    
    
            int d;
            if (a == null || b == null ||
                (d = a.getClass().getName().
                 compareTo(b.getClass().getName())) == 0)
                d = (System.identityHashCode(a) <= System.identityHashCode(b) ?
                     -1 : 1);
            return d;
        }

        final void treeify(Node<K,V>[] tab) {
    
    
            TreeNode<K,V> root = null;
            for (TreeNode<K,V> x = this, next; x != null; x = next) {
    
    
                next = (TreeNode<K,V>)x.next;
                x.left = x.right = null;
                if (root == null) {
    
    
                    x.parent = null;
                    x.red = false;
                    root = x;
                }
                else {
    
    
                    K k = x.key;
                    int h = x.hash;
                    Class<?> kc = null;
                    for (TreeNode<K,V> p = root;;) {
    
    
                        int dir, ph;
                        K pk = p.key;
                        if ((ph = p.hash) > h)
                            dir = -1;
                        else if (ph < h)
                            dir = 1;
                        else if ((kc == null &&
                                  (kc = comparableClassFor(k)) == null) ||
                                 (dir = compareComparables(kc, k, pk)) == 0)
                            dir = tieBreakOrder(k, pk);

                        TreeNode<K,V> xp = p;
                        if ((p = (dir <= 0) ? p.left : p.right) == null) {
    
    
                            x.parent = xp;
                            if (dir <= 0)
                                xp.left = x;
                            else
                                xp.right = x;
                            root = balanceInsertion(root, x);
                            break;
                        }
                    }
                }
            }
            moveRootToFront(tab, root);
        }

        final Node<K,V> untreeify(HashMap<K,V> map) {
    
    
            Node<K,V> hd = null, tl = null;
            for (Node<K,V> q = this; q != null; q = q.next) {
    
    
                Node<K,V> p = map.replacementNode(q, null);
                if (tl == null)
                    hd = p;
                else
                    tl.next = p;
                tl = p;
            }
            return hd;
        }

        /**
         * Tree version of putVal.
         */
        final TreeNode<K,V> putTreeVal(HashMap<K,V> map, Node<K,V>[] tab,
                                       int h, K k, V v) {
    
    
            Class<?> kc = null;
            boolean searched = false;
            TreeNode<K,V> root = (parent != null) ? root() : this;
            for (TreeNode<K,V> p = root;;) {
    
    
                int dir, ph; K pk;
                if ((ph = p.hash) > h)
                    dir = -1;
                else if (ph < h)
                    dir = 1;
                else if ((pk = p.key) == k || (k != null && k.equals(pk)))
                    return p;
                else if ((kc == null &&
                          (kc = comparableClassFor(k)) == null) ||
                         (dir = compareComparables(kc, k, pk)) == 0) {
    
    
                    if (!searched) {
    
    
                        TreeNode<K,V> q, ch;
                        searched = true;
                        if (((ch = p.left) != null &&
                             (q = ch.find(h, k, kc)) != null) ||
                            ((ch = p.right) != null &&
                             (q = ch.find(h, k, kc)) != null))
                            return q;
                    }
                    dir = tieBreakOrder(k, pk);
                }

                TreeNode<K,V> xp = p;
                if ((p = (dir <= 0) ? p.left : p.right) == null) {
    
    
                    Node<K,V> xpn = xp.next;
                    TreeNode<K,V> x = map.newTreeNode(h, k, v, xpn);
                    if (dir <= 0)
                        xp.left = x;
                    else
                        xp.right = x;
                    xp.next = x;
                    x.parent = x.prev = xp;
                    if (xpn != null)
                        ((TreeNode<K,V>)xpn).prev = x;
                    moveRootToFront(tab, balanceInsertion(root, x));
                    return null;
                }
            }
        }

        final void removeTreeNode(HashMap<K,V> map, Node<K,V>[] tab,
                                  boolean movable) {
    
    
            ...
        }

        final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
    
    
            ...
        }

        /* ------------------------------------------------------------ */
        // Red-black tree methods, all adapted from CLR

        static <K,V> TreeNode<K,V> rotateLeft(TreeNode<K,V> root,
                                              TreeNode<K,V> p) {
    
    
            ...
            return root;
        }

        static <K,V> TreeNode<K,V> rotateRight(TreeNode<K,V> root,
                                               TreeNode<K,V> p) {
    
    
            ...
            return root;
        }

        static <K,V> TreeNode<K,V> balanceInsertion(TreeNode<K,V> root,
                                                    TreeNode<K,V> x) {
    
    
            ...
        }

        static <K,V> TreeNode<K,V> balanceDeletion(TreeNode<K,V> root,
                                                   TreeNode<K,V> x) {
    
    
            ...
        }

        static <K,V> boolean checkInvariants(TreeNode<K,V> t) {
    
    
            ...
            return true;
        }
    }

1.5. Data structure changes when inserting data

See [1.2] code, when inserting data, what happens to the data structure?
The effect picture can be seen in the second picture in [1]

  • Determine whether the table is initialized Yes
    -> call the resize() method to initialize and assign
  • If it is not initialized, get the subscript through hash, if the data is null
    tab[i] subscript has no value, create a new Node and assign it
  • The subscript of tab[i] has data, and hash collision occurs, there are three cases
    1. Judging that the hash value of tab[i] is the same as the passed in hash value, and the key value of tab[i] is the same as the passed in key value
    If it is the same, directly replace
    2. Determine that the data structure is a red-black tree.
    Call the data insertion function putTreeVal of the red-black tree.
    3. The data structure is a linked list
    circular list. If the next node of p is null, it means that p is the last node. Insert a new Node node at the end. If the number of elements is greater than or equal to 7 at this time, the linked list will turn into a red-black tree structure.
    If the key already exists in the linked list, exit the loop.

2. Experiment

The experiment includes hash collision and linked list turning red and black tree, let's debug and trace the source code step by step to find out

2.1. Hash collision

2.1.1, and bit operation (&)

  • 1. Calculate the hashCode of the string "A", "B", "C", "D", "E", "F", "G", "H"
  • 2. Convert to binary (https://jisuan5.com/decimal/?hex=356573597)
    convert decimal 15 to binary, result: 1111
    "A" hashCode=356573597 in this experiment, convert 2 to decimal 356573597 Base, result: 10101010000001110000110011101
    356573597 & 15
    = 10101010000001110000110011101 & 1111
    = 10101010000001110000110011101 & 000
    00000000000000000000001111 (high complement 0, align data on the left)
    = 00000000000000000000000001101 = 1101 (high 0 can be omitted)
  • 3. After several experiments of changing 15 (2 to the 4th power -1) into other data, we found that the power of 2 -1, the lower bits are all 1, and the data will be evenly distributed when doing AND operations; Change to the power of 2 or other data, there is 0 in the low bit, and the final data distribution is uneven;
    insert image description here
    insert image description here
	int n = 16 - 1;   //二进制: 1111
    String[] strs = {
    
     "A" , "B" , "C" , "D" , "E" , "F" , "G" , "H"};
    for (int i = 0; i < strs.length; i++) {
    
    
        System.out.println("-------------------------");
        System.out.println(System.identityHashCode(strs[i])  );
        System.out.println("二进制:"+  Integer.toBinaryString(System.identityHashCode(strs[i])) );
        System.out.println( System.identityHashCode(strs[i]) & n );
        System.out.println("-------------------------");
    }

-------------------------
356573597
二进制:10101010000001110000110011101
13
-------------------------
-------------------------
1735600054
二进制:1100111011100110010011110110110
6
-------------------------
-------------------------
21685669
二进制:1010010101110010110100101
5
-------------------------
-------------------------
2133927002
二进制:1111111001100010010010001011010
10
-------------------------
-------------------------
1836019240
二进制:1101101011011110110111000101000
8
-------------------------
-------------------------
325040804
二进制:10011010111111011101010100100
4
-------------------------
-------------------------
1173230247
二进制:1000101111011100001001010100111
7
-------------------------
-------------------------
856419764
二进制:110011000010111110110110110100
4
-------------------------

2.1.2. Hash Collision

In the code case of [2.1.1], after the hash value of F and H is ANDed with 15, the value is 4. See [1.5] for detailed explanation, which is equivalent to calculating the index position in HashMap

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    
    
		...
		if ((p = tab[i = (n - 1) & hash]) == null)
		...                   
}

insert image description here
Here are several solutions to hash collision (hash collision):

  • Chain address method
    The data that encounters a hash collision has the same index in the array, and then uses a linked list to store the collided data (JDK8's HashMap adopts this method, and uses the tail insertion method).
  • Re-hash method
    When encountering the problem of hash collision, hash here until the conflict no longer occurs. This method is not easy to generate aggregation, but increases the calculation time
  • Open address method
    When a hash collision problem is encountered, a free unit is found from the hash table in a certain order starting from the unit where the conflict occurs. A method that then stores the conflicting element into the cell.
  • Establish a public overflow area
    Divide the hash table into a public table and an overflow table. When overflow occurs, put all overflow data into the overflow area

2.2. Linked list becomes red-black tree

Given 2 assumptions (in HashMap, the default load factor is 0.75, when the length is greater than 12 (16*0.75=12), it will be expanded to 32, and the recalculated value of the bit operation will also change, and the rebalanced distribution will be explained in detail See [1.5])

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    
    
		...
		//p的下一个节点为null,表示p就是最后一个节点
        if ((e = p.next) == null) {
    
    
              //创建新的Node节点并插入链表的尾部
              p.next = newNode(hash, key, value, null);
              //当元素>=8-1,链表转为树(红黑树)结构
              if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                  treeifyBin(tab, hash);
              break;
          }
		...                   
}

1. The hash bucket does not expand;
2. (F, H...H8 ) & 15 are all equal to 4, a total of 9 elements. When the seventh element is added, it will trigger the linked list to turn into a red-black tree. When H7 and H8 are inserted, Directly popular black tree insertion logic
insert image description here

Guess you like

Origin blog.csdn.net/s445320/article/details/132556690