[Java container source code series] HashSet source code analysis

Look at the source code first and look at the class comments. The information we can get is:

  1. The underlying implementation is based on HashMap, so it is not guaranteed to iterate in the insertion order or other order during iteration;
  2. The time-consuming performance of add, remove, contanins, size and other methods will not increase with the increase of data volume. This is mainly related to the underlying array data structure of HashMap. Regardless of the amount of data, it does not consider hash conflicts. , The time complexity is O (1);
  3. If the thread is not safe, please lock it yourself if you need security, or use Collections.synchronizedSet;
  4. During the iteration process, if the data structure is changed, it will fail quickly and a ConcurrentModificationException will be thrown.

1. Structure

HashSet inheritance relationship, core member variables, main constructor:

public class HashSet<E>
    extends AbstractSet<E>
    implements Set<E>, Cloneable, java.io.Serializable{
    
    
    
    // 把 HashMap 组合进来,key 是 Hashset 的 key,value 是下面的 PRESENT
    private transient HashMap<E,Object> map;

    // HashMap 中的 value,所有node中的value相同
    private static final Object PRESENT = new Object();
    
    //---------------------------构造方法---------------------------------------
    // 直接初始化一个HashMap
    public HashSet() {
    
    
        map = new HashMap<>();
    }
    
    // 对 HashMap 的容量进行了计算,在 16 和 给定值大小之间选择最大的值
    public HashSet(Collection<? extends E> c) {
    
    
    	// 选取最优初始容量
        map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
        addAll(c);
    }
}

1.1 The realization of Set is based on the uniqueness of Map-based keys

  • Because if the keys of the Map are the same, you must choose whether to overwrite, so there is no equal key
  • The value of Map is the same in Set (ie PRESENT)

1.2 How does HashSet combine HashMap

Just now I saw from the class comments that the implementation of HashSet is based on HashMap. In Java, there are two ways to implement innovative implementation based on basic classes:

  • Inherit the basic class and override the method of the basic class, for example, inherit the HashMap and override the add method;
  • Combine the basic classes and reuse the capabilities of the basic classes by calling the methods of the basic classes.

HashSet uses a combined HashMap , and its advantages are as follows:

  • Inheritance means that the parent and child classes are the same thing, and Set and Map originally wanted to express two things, so inheritance is not appropriate, and Java syntax is restricted, and the child class can only inherit one parent class, which is difficult to extend later.
  • The combination is more flexible, the existing basic classes can be combined arbitrarily, and the methods can be extended and arranged on the basis of the methods of the basic classes, and the method names can be named arbitrarily, without the need to be consistent with the method names of the basic classes.

If you encounter similar problems, our principle is to use composition as much as possible and use less inheritance .

1.3 Optimal capacity initialization

In the above code: Math.max ((int) (c.size ()/.75f) + 1, 16), which calculates the capacity of the HashMap, and translated into Chinese is to take the maximum of the two numbers in the brackets ( Expected value / 0.75+1, default value 16). From the calculation, we can see that the implementer of HashSet is very clear about the underlying implementation of HashMap, which is mainly reflected in two aspects:

  • Comparing the size with 16 means that if the initial capacity of a given HashMap is less than 16, it will be initialized according to the HashMap's default 16; if it is greater than 16, it will be initialized according to the given value.
  • The calculation formula of the threshold value of HashMap expansion is: the capacity of the map * 0.75f, once the threshold is reached, the capacity will be expanded, here (int) (c.size ()/.75f) + 1 is used to represent the initial value, so If the size value we expect is exactly 1 larger than the expansion threshold, there will be no expansion, which conforms to the HashMap expansion formula.

It can be thought of the template formula for HashMap to initialize the size value: take the maximum of the two in the brackets (expected value/0.75+1, default value 16) , because although the expected value is used, the slot can be used in the array It only accounts for 0.75 in total, so / 0.75 is required. +1 is to avoid the situation that just reaches 0.75;

2. Method analysis & api

The other methods of HashSet are relatively simple, that is, some packaging of the Map API

2.1 add

add simply wraps the put of HashMap

public boolean add(E e) {
    
    
    // 直接使用 HashMap 的 put 方法,进行一些简单的逻辑判断
    // 注:这里所有value都是PRESENT
    return map.put(e, PRESENT)==null;
}

2.2 remove

public boolean remove(Object o) {
    
    
    	// 只有当o存在时,才能删除成功
    	// 而在HashMap中,o是key,当key存在时再删除才会返回value
        return map.remove(o)==PRESENT;
}

2.3 iterator

Iterator, directly returns the key iterator of HashMap, because the key of HashMap is composed of Set

public Iterator<E> iterator() {
    
    
      return map.keySet().iterator();
}

Finally, the specific implementation of HashSet is worth learning from

  • Analysis and grasp of combination or inheritance;
  • Carry out some packaging for complex logic to make the interface that spit out as simple and easy to use as possible;
  • When combining other apis, try to learn more about the combined api, so that you can use the api better;
  • HashMap initialization size value template formula: take the maximum value of the two in the brackets (expected value / 0.75+1, default value 16)

Guess you like

Origin blog.csdn.net/weixin_43935927/article/details/108519853