HashSet, TreeSet source code analysis

HashSet, TreeSet source code analysis

The two classes HashSet and TreeSet are assembled on the basis of Map. The focus of our study is how Set uses the existing functions of Map to achieve its own goals, that is, how to base on existing Innovate the function, and then see if some small details of the change are worth learning.

一:HashSet

1.1, HashSet class annotation

Looking at the source code, we must first look at the class comments. Let’s take a look at what the class comments say. Some screenshots are shown in the following figure: The
Insert picture description here
class comments mainly talk about the following four points

  1. The underlying implementation is based on HashMap, so it is not guaranteed to iterate in the insertion order or other order during iteration;
  2. The time-consuming performance of add, remove, contanins, size and other methods will not increase with the increase of the amount of data. This is mainly related to the underlying array data structure of the HashMap, regardless of the amount of data, regardless of the hash conflict. , The time complexity is O (1)*;
  3. If the thread is not safe, please lock it yourself if you need security, or use Collections.synchronizedSet;
  4. During the iteration process, if the data structure is changed, it will fail quickly, and ConcurrentModificationException will be thrown;

We have also seen the class annotations of List and Map before, and we found that points 2, 3, and 4 are mentioned in the class annotations. So if someone asks about the similarities of List, Map, and Set, then you can say 2. , 3, 4 three points.

1.2 How does HashSet combine HashMap

Just now I saw from class annotation 1 that the implementation of HashSet is based on HashMap. In Java, there are two ways to implement innovative implementation based on basic classes:

  1. Inherit the basic class and override the method of the basic class, for example, inherit the HashMap and override the add method;
  2. Combine the basic classes and reuse the capabilities of the basic classes by calling the methods of the basic classes;

HashSet uses a combined HashMap, and its advantages are as follows:

  1. Inheritance means that the parent and child classes are the same thing, and Set and Map originally wanted to express two things, so inheritance is not appropriate, and Java syntax restrictions, the child class can only inherit one parent class, and subsequent expansion is difficult.
  2. The combination is more flexible, the existing basic classes can be combined arbitrarily, and the methods can be extended and arranged on the basis of the methods of the basic classes, and the method names can be named arbitrarily, without the need to be consistent with the method names of the basic classes;

In our work, if we encounter similar problems, our principle is to use composition as much as possible and use less inheritance.

Combination is to treat HashMap as one of its own local variables. The following is the combined implementation of HashSet:

// 把 HashMap 组合进来,key 是 Hashset 的 key,value 是下面的 PRESENT
private transient HashMap<E,Object> map;
// HashMap 中的 value
private static final Object PRESENT = new Object();

From these two lines of code, we can see two things:

  1. When we use HashSet, such as the add method, there is only one input parameter, but the add method of the combined Map has two input parameters, key and value. The key corresponding to the Map is the input parameter of our add, and the value is the second line. PRESENT in the code, the design here is very clever, replace the Value of Map with a default value PRESENT;
  2. If HashSet is shared, there will be thread safety issues when multiple threads access it, because in all subsequent operations, there is no lock;

When HashSet is implemented based on HashMap, first choose the combination method, and then use the default value to replace the Value value in the Map. The design is very clever, giving the user a good experience, and it is simple and convenient to use. You can also learn from this idea in your work, you can wrap the underlying complex implementations, some default implementations can be eaten by yourself, so that the spit out interface is as simple and easy to use as possible.

1.2.1, initialization

The initialization of HashSet is relatively simple, just use new HashMap directly. What's more interesting is that when the original collection data is initialized, the initial capacity of HashMap will be calculated. The source code is as follows:

// 对 HashMap 的容量进行了计算
public HashSet(Collection<? extends E> c) {
    
    
    map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
    addAll(c);
}

As for the other methods of HashSet, it is relatively simple, that is, some packaging of Map's api, the following add method is implemented:

public boolean add(E e) {
    
    
    // 直接使用 HashMap 的 put 方法,进行一些简单的逻辑判断
    return map.put(e, PRESENT)==null;
}

From the add method, we can see the benefits of combination. The input parameters, name, and return value of the method can all be customized. If it is inherited, it won't work.

1.2.2 Summary

The specific implementation of HashSet is worth our reference mainly as follows, when we usually write code, we can refer to it:

  1. Analysis and grasp of combination or inheritance;
  2. Carry out some packaging for complex logic to make the interface that spit out as simple and easy to use as possible;
  3. When combining other apis, try to learn more about the combined api, so that you can use the api better;

Two: TreeSet

The general structure of TreeSet is similar to HashSet. The underlying combination is TreeMap, so it inherits the function of TreeMap key sorting. When iterating, you can also iterate according to the sort order of keys. We mainly look at the reuse of TreeMap. Two ideas:

2.1, the idea of ​​reusing TreeMap

Scenario 1: TreeSet's add method, let's look at its source code:

public boolean add(E e) {
    
    
    return m.put(e, PRESENT)==null;
}

As you can see, the bottom layer directly uses the put capability of HashMap, just use it directly.

2.2, the second idea of ​​reusing TreeMap

Scenario 2: Need to iterate the elements in the TreeSet, it should be like add, directly use the existing iterative ability of HashMap, for example, like the following:

// 模仿思路一的方式实现
public Iterator<E> descendingIterator() {
    
    
    // 直接使用 HashMap.keySet 的迭代能力
    return m.keySet().iterator();
}

This is the realization of idea one. TreeSet combines TreeMap and directly selects the underlying capabilities of TreeMap for packaging, but the actual implementation of TreeSet is completely opposite. Let’s look at the source code:

// NavigableSet 接口,定义了迭代的一些规范,和一些取值的特殊方法
// TreeSet 实现了该方法,也就是说 TreeSet 本身已经定义了迭代的规范
public interface NavigableSet<E> extends SortedSet<E> {
    
    
    Iterator<E> iterator();
    E lower(E e);
}  
// m.navigableKeySet() 是 TreeMap 写了一个子类实现了 NavigableSet
// 接口,实现了 TreeSet 定义的迭代规范
public Iterator<E> iterator() {
    
    
    return m.navigableKeySet().iterator();
}

The implementation source code of the NavigableSet interface in TreeMap is as follows:

// TreeMap 为了满足 Set 的功能,实现了 Set 定义的 NavigableSet 的接口
static final class KeySet<E> extends AbstractSet<E> implements NavigableSet<E> {
    
    
    private final NavigableMap<E, ?> m;
    KeySet(NavigableMap<E,?> map) {
    
     m = map; }

    public Iterator<E> iterator() {
    
    
        if (m instanceof TreeMap)
            return ((TreeMap<E,?>)m).keyIterator();
        else
            return ((TreeMap.NavigableSubMap<E,?>)m).keyIterator();
    }

    public Iterator<E> descendingIterator() {
    
    
        if (m instanceof TreeMap)
            return ((TreeMap<E,?>)m).descendingKeyIterator();
        else
            return ((TreeMap.NavigableSubMap<E,?>)m).descendingKeyIterator();
    }

	// 这些都是 NavigableSet 接口里面的方法
    public int size() {
    
     return m.size(); }
    public boolean isEmpty() {
    
     return m.isEmpty(); }
    public boolean contains(Object o) {
    
     return m.containsKey(o); }
    public void clear() {
    
     m.clear(); }
    public E lower(E e) {
    
     return m.lowerKey(e); }
    public E floor(E e) {
    
     return m.floorKey(e); }
    public E ceiling(E e) {
    
     return m.ceilingKey(e); }
    public E higher(E e) {
    
     return m.higherKey(e); }
    public E first() {
    
     return m.firstKey(); }
    public E last() {
    
     return m.lastKey(); }
    public Comparator<? super E> comparator() {
    
     return m.comparator(); }
    public E pollFirst() {
    
    
        Map.Entry<E,?> e = m.pollFirstEntry();
        return (e == null) ? null : e.getKey();
    }
    public E pollLast() {
    
    
        Map.Entry<E,?> e = m.pollLastEntry();
        return (e == null) ? null : e.getKey();
    }
    public boolean remove(Object o) {
    
    
        int oldSize = size();
        m.remove(o);
        return size() != oldSize;
    }
    public NavigableSet<E> subSet(E fromElement, boolean fromInclusive,
                                  E toElement,   boolean toInclusive) {
    
    
        return new KeySet<>(m.subMap(fromElement, fromInclusive,
                                      toElement,   toInclusive));
    }
    public NavigableSet<E> headSet(E toElement, boolean inclusive) {
    
    
        return new KeySet<>(m.headMap(toElement, inclusive));
    }
    public NavigableSet<E> tailSet(E fromElement, boolean inclusive) {
    
    
        return new KeySet<>(m.tailMap(fromElement, inclusive));
    }
    public SortedSet<E> subSet(E fromElement, E toElement) {
    
    
        return subSet(fromElement, true, toElement, false);
    }
    public SortedSet<E> headSet(E toElement) {
    
    
        return headSet(toElement, false);
    }
    public SortedSet<E> tailSet(E fromElement) {
    
    
        return tailSet(fromElement, true);
    }
    public NavigableSet<E> descendingSet() {
    
    
        return new KeySet<>(m.descendingMap());
    }

    public Spliterator<E> spliterator() {
    
    
        return keySpliteratorFor(m);
    }
}

From the source code of TreeMap, we can see that TreeMap implements various special methods defined by TreeSet.

We can see that this idea is that TreeSet defines the specification of the interface, and TreeMap is responsible for the implementation. The realization idea and idea are the opposite.

We summarize the two ideas for implementing TreeSet and TreeMap:

  1. TreeSet directly uses some functions of TreeMap and wraps itself into a new api;
  2. TreeSet defines the api you want, defines the interface specification yourself, and lets TreeMap implement it;

The calling relationship of schemes 1 and 2 is that TreeSet calls TreeMap, but the function realization relationship is completely opposite. The first is that the function definition and realization are in TreeMap, and TreeSet is just a simple call. The second TreeSet defines the interface. After that, let TreeMap implement the internal logic, TreeSet is responsible for the interface definition, and TreeMap is responsible for the specific implementation. In this case, because the interface is defined by TreeSet, the implementation must be what TreeSet wants most. TreeSet does not even need to wrap, and you can directly return the value. You can vomit it out.

Let's think about the reasons for these two reuse ideas:

  1. For simple methods like add, we directly use Idea 1, mainly because the methods of add are relatively simple to implement without complicated logic, so TreeSet is relatively simple to implement by itself;
  2. Idea 2 is mainly suitable for complex scenarios, such as iterative scenarios. TreeSet has complex scenarios. For example, it needs to be able to iterate from scratch, for example, to be able to take the first value, for example, to be able to take the last value, plus the underlying structure of TreeMap is more complicated, TreeSet It may not be clear about the underlying complex logic of TreeMap. At this time, if TreeSet is used to implement such complex scene logic, TreeSet will not be able to handle it. It is better to let TreeSet define the interface and let TreeMap be responsible for the implementation. TreeMap is very clear about the underlying complex structure. It is accurate and simple to implement;

2.3 Summary

The two different reuse ideas of TreeSet and TreeMap are very important. They are often encountered in work, especially the second idea, such as dubbo's generalization call, dependency inversion in DDD, etc., the principles are the second type of TreeSet The reuse of ideas.

Three: Summary

HashSet's in-depth understanding and design of the threshold value of the combined HashMap class expansion is worth learning. TreeSet's two reuse ideas for TreeMap are worth learning, especially the second reuse idea.

Guess you like

Origin blog.csdn.net/weixin_38478780/article/details/107979931