HashSet source code analysis-how to realize the elements are not repeated

The core question: How does the set realize the elements are not repeated?

HashSet source code analysis

HashMap construction

The bottom layer is to implement a HashMap

public HashSet() {
    
    
    map = new HashMap<>();
}

add method

The add operation is to put the element as a key into the map, and store a present value each time. This value has no meaning and will waste space each time.

public boolean add(E e) {
    
    
    return map.put(e, PRESENT)==null;
}
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();

Map source code

put method

public V put(K key, V value) {
    
    
    return putVal(hash(key), key, value, false, true);
}

hash method

static final int hash(Object key) {
    
    
    int h;
    //计算hash值,根据传入的对象类型调用不同的hash函数
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

putValue method

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    
    
    //tab存放 当前的哈希桶, p用作临时链表节点  
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    //table为null,初始化table长度
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    //i是散列值,table[i]上没有数据,没有发生哈希碰撞。存放数据
    if ((p = tab[i = (n - 1) & hash]) == null)
        //table[i]上存放节点Node
        tab[i] = newNode(hash, key, value, null);
    //table[i]上有数据,发生哈希碰撞
    else {
    
    
        Node<K,V> e; K k;
        //hash是节点p的属性
        //先比较hash是否一样,在比较key是否一样,如果相同,记录当前的节点e。(后面覆盖value会用到)
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        //红黑树
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        //当前节点的hash一致,但是key不同,则在当前节点.next插入一个节点,变为链表
        else {
    
    
            //遍历到链表尾部看是否有相同的key,没有则加,有则替换Node(后面替换value会用到)
            for (int binCount = 0; ; ++binCount) {
    
    
                //后面没有节点,可以创建节点,退出循环
                if ((e = p.next) == null) {
    
    
                    p.next = newNode(hash, key, value, null);
                    //如果追加节点后,binCount >=7(也就是8个节点的时候),则转化为红黑树
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                //当前节点后面有节点,并且这个节点和当前的节点的hash和key都相同则找到了要替换的节点位置,退出循环(已经在e = p.next中更新了节点)
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                //p指向下一个节点,在前面的代码中e = p.next
                p = e;
            }
        }
        //如果e不是null,说明有需要覆盖的节点,
        if (e != null) {
    
     // existing mapping for key
            //则覆盖节点值,并返回原oldValue
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            //这是一个空实现的函数,用作LinkedHashMap重写使用。
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    //更新size,并判断是否需要扩容--大于初始容量。
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
static final int TREEIFY_THRESHOLD = 8;

Why (n-1) & hash?

We generally think of using the hash value to modulate length (that is, the division hash method) for the hash of the hash table. This method is also implemented in Hashtable. This method can basically ensure that the elements are hashed in the hash table. Is relatively uniform, but the modulus will use the division operation, which is very inefficient. In HashMap, the (length-1)&hash method is used to replace the modulus , which also achieves uniform hashing, but the efficiency is much higher, which is also HashMap is an improvement of Hashtable.

Why must the capacity of the hash table be a power of 2?

First of all, if length is an integer power of 2, (n-1) & hash is equivalent to taking a modulus to length, which ensures the uniformity of hashing and also improves efficiency;

Secondly, if length is an integer power of 2, it is an even number, so length-1 is an odd number, and the last digit of the odd number is 1, which ensures that the last digit of (n-1) & hash may be 0 or possible It is 1 (it depends on the value of h), that is, the result after the AND may be even or odd, so that the uniformity of the hash can be guaranteed, and if length is odd, it is obvious that length-1 is even, Its last digit is 0, so the last digit of (n-1) & hash must be 0, that is, it can only be an even number, so any hash value will only be hashed to the even subscript position of the array. Nearly half of the space is wasted. Therefore, length is an integer power of 2 to make the probability of collision of different hash values ​​smaller, so that the elements can be evenly hashed in the hash table.

Static inner class Node

/**
 * The table, initialized on first use, and resized as
 * necessary. When allocated, length is always a power of two.
 * (We also tolerate length zero in some operations to allow
 * bootstrapping mechanics that are currently not needed.)
 */
//map底层实现是一个Node数组
transient Node<K,V>[] table;

The bottom layer of node stores key, value, and hash value

/**
 * Basic hash bin node, used for most entries.  (See below for
 * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
 */
static class Node<K,V> implements Map.Entry<K,V> {
    
    
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
    
    
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        {
    
     return key; }
    public final V getValue()      {
    
     return value; }
    public final String toString() {
    
     return key + "=" + value; }

    public final int hashCode() {
    
    
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
    
    
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
    
    
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
    
    
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

answer:

The bottom layer of set is implemented by map. When storing elements, hash (key) is performed. If the repeated key is overridden, the node will be overwritten (in fact, the value is overwritten), and the value is fixed and the same, so there is no duplicate element.

Guess you like

Origin blog.csdn.net/qq_38783664/article/details/111060459