In-depth understanding of HashSet (the bottom layer is HashMap)

First there is a sad story

To be fair, this is the first time I got stuck in an interview, so that it diverted the interviewer's attention (...), but fortunately, I was pointed out before and I have studied HashMap carefully, so it is not impossible to remedy

Second, I was stunned

I wanted to take a good look at the underlying implementation of HashSet when I came back, but when I opened the source code, I was stunned.  Why is
write picture description here
wocao so dazzling? You are a set, you are a subclass of Collection, and your uncle is a Map. 
write picture description here 
My heart hurts when you are like this  .
write picture description here 
Calm down and think about it carefully. Sets cannot have duplicate elements, and HashMap does not allow duplicate keys. It's a mouthful of old blood, I didn't even think about it at the time and didn't dare to think so

Go to dalao's blog

So I went to see dalao's blog on the Internet and found this article that I reprinted dalao's blog without permission and deleted it.

HashSet overview and implementation

HashSet implements the Set interface and is backed by a hash table (actually a HashMap instance). It does not guarantee the iteration order of the set; in particular it does not guarantee that the order will be constant, and this class allows null elements. 
In HashSet, the elements are stored on the Key of the HashMap key-value pair, and the Value has a uniform valueprivate static final Object PRESENT = new Object(); , ( define a virtual Object object as the value of HashMap, and define this object as static final. )

HashSet insert

When a new value is added, the underlying HashMap will determine whether the key value exists (for details of HashMap, please move to understand HashMap in depth ). If it does not exist, the new value will be inserted, and the details of this insertion will be inserted according to the details of HashMap; do not insert

delete

Same as HashMap deletion principle

Source code analysis

Steal (xue) and use (xi) the analysis code of dalao, please report the infringement and delete it immediately

public class HashSet<E>  
    extends AbstractSet<E>  
    implements Set<E>, Cloneable, java.io.Serializable  
{  
    static final long serialVersionUID = -5024744406713321676L;  

    // 底层使用HashMap来保存HashSet中所有元素。  
    private transient HashMap<E,Object> map;  

    // 定义一个虚拟的Object对象作为HashMap的value,将此对象定义为static final。  
    private static final Object PRESENT = new Object();  

    /** 
     * 默认的无参构造器,构造一个空的HashSet。 
     *  
     * 实际底层会初始化一个空的HashMap,并使用默认初始容量为16和加载因子0.75。 
     */  
    public HashSet() {  
    map = new HashMap<E,Object>();  
    }  

    /** 
     * 构造一个包含指定collection中的元素的新set。 
     * 
     * 实际底层使用默认的加载因子0.75和足以包含指定 
     * collection中所有元素的初始容量来创建一个HashMap。 
     * @param c 其中的元素将存放在此set中的collection。 
     */  
    public HashSet(Collection<? extends E> c) {  
    map = new HashMap<E,Object>(Math.max((int) (c.size()/.75f) + 1, 16));  
    addAll(c);  
    }  

    /** 
     * 以指定的initialCapacity和loadFactor构造一个空的HashSet。 
     * 
     * 实际底层以相应的参数构造一个空的HashMap。 
     * @param initialCapacity 初始容量。 
     * @param loadFactor 加载因子。 
     */  
    public HashSet(int initialCapacity, float loadFactor) {  
    map = new HashMap<E,Object>(initialCapacity, loadFactor);  
    }  

    /** 
     * 以指定的initialCapacity构造一个空的HashSet。 
     * 
     * 实际底层以相应的参数及加载因子loadFactor为0.75构造一个空的HashMap。 
     * @param initialCapacity 初始容量。 
     */  
    public HashSet(int initialCapacity) {  
    map = new HashMap<E,Object>(initialCapacity);  
    }  

    /** 
     * 以指定的initialCapacity和loadFactor构造一个新的空链接哈希集合。 
     * 此构造函数为包访问权限,不对外公开,实际只是是对LinkedHashSet的支持。 
     * 
     * 实际底层会以指定的参数构造一个空LinkedHashMap实例来实现。 
     * @param initialCapacity 初始容量。 
     * @param loadFactor 加载因子。 
     * @param dummy 标记。 
     */  
    HashSet(int initialCapacity, float loadFactor, boolean dummy) {  
    map = new LinkedHashMap<E,Object>(initialCapacity, loadFactor);  
    }  

    /** 
     * 返回对此set中元素进行迭代的迭代器。返回元素的顺序并不是特定的。 
     *  
     * 底层实际调用底层HashMap的keySet来返回所有的key。 
     * 可见HashSet中的元素,只是存放在了底层HashMap的key上, 
     * value使用一个static final的Object对象标识。 
     * @return 对此set中元素进行迭代的Iterator。 
     */  
    public Iterator<E> iterator() {  
    return map.keySet().iterator();  
    }  

    /** 
     * 返回此set中的元素的数量(set的容量)。 
     * 
     * 底层实际调用HashMap的size()方法返回Entry的数量,就得到该Set中元素的个数。 
     * @return 此set中的元素的数量(set的容量)。 
     */  
    public int size() {  
    return map.size();  
    }  

    /** 
     * 如果此set不包含任何元素,则返回true。 
     * 
     * 底层实际调用HashMap的isEmpty()判断该HashSet是否为空。 
     * @return 如果此set不包含任何元素,则返回true。 
     */  
    public boolean isEmpty() {  
    return map.isEmpty();  
    }  

    /** 
     * 如果此set包含指定元素,则返回true。 
     * 更确切地讲,当且仅当此set包含一个满足(o==null ? e==null : o.equals(e)) 
     * 的e元素时,返回true。 
     * 
     * 底层实际调用HashMap的containsKey判断是否包含指定key。 
     * @param o 在此set中的存在已得到测试的元素。 
     * @return 如果此set包含指定元素,则返回true。 
     */  
    public boolean contains(Object o) {  
    return map.containsKey(o);  
    }  

    /** 
     * 如果此set中尚未包含指定元素,则添加指定元素。 
     * 更确切地讲,如果此 set 没有包含满足(e==null ? e2==null : e.equals(e2)) 
     * 的元素e2,则向此set 添加指定的元素e。 
     * 如果此set已包含该元素,则该调用不更改set并返回false。 
     * 
     * 底层实际将将该元素作为key放入HashMap。 
     * 由于HashMap的put()方法添加key-value对时,当新放入HashMap的Entry中key 
     * 与集合中原有Entry的key相同(hashCode()返回值相等,通过equals比较也返回true), 
     * 新添加的Entry的value会将覆盖原来Entry的value,但key不会有任何改变, 
     * 因此如果向HashSet中添加一个已经存在的元素时,新添加的集合元素将不会被放入HashMap中, 
     * 原来的元素也不会有任何改变,这也就满足了Set中元素不重复的特性。 
     * @param e 将添加到此set中的元素。 
     * @return 如果此set尚未包含指定元素,则返回true。 
     */  
    public boolean add(E e) {  
    return map.put(e, PRESENT)==null;  
    }  

    /** 
     * 如果指定元素存在于此set中,则将其移除。 
     * 更确切地讲,如果此set包含一个满足(o==null ? e==null : o.equals(e))的元素e, 
     * 则将其移除。如果此set已包含该元素,则返回true 
     * (或者:如果此set因调用而发生更改,则返回true)。(一旦调用返回,则此set不再包含该元素)。 
     * 
     * 底层实际调用HashMap的remove方法删除指定Entry。 
     * @param o 如果存在于此set中则需要将其移除的对象。 
     * @return 如果set包含指定元素,则返回true。 
     */  
    public boolean remove(Object o) {  
    return map.remove(o)==PRESENT;  
    }  

    /** 
     * 从此set中移除所有元素。此调用返回后,该set将为空。 
     * 
     * 底层实际调用HashMap的clear方法清空Entry中所有元素。 
     */  
    public void clear() {  
    map.clear();  
    }  

    /** 
     * 返回此HashSet实例的浅表副本:并没有复制这些元素本身。 
     * 
     * 底层实际调用HashMap的clone()方法,获取HashMap的浅表副本,并设置到HashSet中。 
     */  
    public Object clone() {  
        try {  
            HashSet<E> newSet = (HashSet<E>) super.clone();  
            newSet.map = (HashMap<E, Object>) map.clone();  
            return newSet;  
        } catch (CloneNotSupportedException e) {  
            throw new InternalError();  
        }  
    }  
}  

Notice

  • 说白了,HashSet就是限制了功能的HashMap,所以了解HashMap的实现原理,这个HashSet自然就通
  • 对于HashSet中保存的对象,主要要正确重写equals方法和hashCode方法,以保证放入Set对象的唯一性
  • 虽说是Set是对于重复的元素不放入倒不如直接说是底层的Map直接把原值替代了(这个Set的put方法的返回值真有意思)
  • HashSet没有提供get()方法,愿意是同HashMap一样,Set内部是无序的,只能通过迭代的方式获得

说起来你可能不信

本来是打算分开写集合框架的底层分析的,直到我发现,LinkedHashSet是继承自HashSet,底层实现是LinkedHashMap。并且其初始化时直接super(......),瞬间我就觉得,Set写在一起得了

LinkedHashSet

同HashSet相比并没有实现新的功能(新的方法),只不过把HashSet中预留的构造方法启用了,因而可以实现有序插入,而这个具体的实现要去看LinkedHashMap了,我们使用时是不需要再可以去设置参数的,直接拿来用即可。

    /**
     * The iteration ordering method for this linked hash map: <tt>true</tt>
     * for access-order, <tt>false</tt> for insertion-order.
     *
     * @serial
     */
    final boolean accessOrder;

查看了LinkedHashMap的构造方法后,发现其因为继承自HashMap,所以其底层实现也是HashMap!!!(呵呵,我已经发现了……怪不得还是得主要研究HashMap啊),然后发现了LinkedHashMap调用父类构造方法初始化时,还顺便设置了变量accessOrder = false,看上面得源码可以知道,这是给了迭代器一个参数,false代表迭代时使用插入得顺序(追根溯源了,真爽)

偶然发现

When looking at the source code, I found a strange rewriting method: public Spliterator<E> spliterator(), I checked the data and found that it is called divisible iterator. This interface is an iterator designed to traverse the elements in the data source in parallel, in order to better play multi-core CPU capability. 
In fact, this reminds me to pay attention to the concurrency safety in the collection framework.

TreeSet

According to the urgency of Set, I will first guess that the underlying implementation of TreeSet is TreeMap (and I am guessing that the underlying implementation of TreeMap uses HashMap). Take a look at the source code, oh, I'll go, it's really (hehe, who is your father.... I feel distressed for a wave of Collection, Map does not inherit the Collection interface)

    public TreeSet() {
        this(new TreeMap<E,Object>());
    }
  • 1
  • 2
  • 3

TreeSet Features and Implementation Mechanism

The elements stored in the TreeSet are ordered (not the order in which they were inserted, but sorted by the size of the keyword), and the elements cannot be repeated. 
How to implement ordered storage requires a comparator. In fact, TreeSet is more concerned about non-repetitive and orderly. This ordering requires a compare process, so parameters need to implement the Comparable interface.

    /**
     * Constructs a new, empty tree set, sorted according to the specified
     * comparator.  All elements inserted into the set must be <i>mutually
     * comparable</i> by the specified comparator: {@code comparator.compare(e1,
     * e2)} must not throw a {@code ClassCastException} for any elements
     * {@code e1} and {@code e2} in the set.  If the user attempts to add
     * an element to the set that violates this constraint, the
     * {@code add} call will throw a {@code ClassCastException}.
     *
     * @param comparator the comparator that will be used to order this set.
     *        If {@code null}, the {@linkplain Comparable natural
     *        ordering} of the elements will be used.
     */
    public TreeSet(Comparator<? super E> comparator) {
        this(new TreeMap<>(comparator));
    }

so

Therefore, when using Set, you need to pay attention to selecting the correct storage structure according to your own needs. Since there is no get() method for you to use, you still need to use the iterator to get the desired element , and then this Set will go deeper. This is the end of the analysis, I'm going to open another pit to study TreeMap (funny)

short summary

After such a funny experience, it seems that it is really necessary to look at the underlying implementations of several commonly used collection frameworks, so as to avoid such embarrassment (manually funny) 
. There is also a Queue in the family of . In fact, it can be reduced to an in-depth understanding of HashMap. Come, this is the rhythm. let's go.

Reprinted from: https://blog.csdn.net/Sugar_Rainbow/article/details/68257208

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325589263&siteId=291194637