Jdk7.0 source code analysis HashSet underlying implementation principle

Jdk7.0 source code analysis HashSet underlying implementation principle


1. Overview of HashSet

HashSet is an implementation class of the Set interface. It is used to store multiple elements. The storage characteristics are disorder, no subscript, and elements cannot be repeated.

At the same time, null is allowed as an element; and the thread is not safe because the internal method is an asynchronous method.

Second, the storage structure of HashSet

For HashSet, the bottom layer is implemented based on HashMap; the storage of HashSet data is to use HashMap to store pairs

Corresponding data, related operations are also implemented by calling the HashMap method, so the underlying implementation of HashSet is relatively simple.

Three, HashSet internal implementation principle mechanism (source code analysis)

  1. The basic elements of HashSet

    public class HashSet<E> extends AbstractSet<E>
        implements Set<E>, Cloneable, java.io.Serializable
    {
          
          
        static final long serialVersionUID = -5024744406713321676L;
        
    	// 底层定义一个HashMap,用于存储HashSet的元素
        private transient HashMap<E,Object> map;
    
        // 定义一个虚拟的Object对象,作为HashMap的value,被static final修饰 
        private static final Object PRESENT = new Object();
    
  2. The construction method in HashSet:

    /*
    	无参数的构造器,其实底层默认创建一个空的HashMap对象,
    	并使用默认的初始容量为16,加载因子为0.75
    */
    public HashSet() {
          
          
    	map = new HashMap<>();
    }
    /*
    	创建一个包含 Collection中的元素HashSet对象,
    	底层默认创建一个HashMap,容量为足以存储Collection中的元素,
    	同时采用默认的加载因子 0.75
    */
    public HashSet(Collection<? extends E> c) {
          
          
        map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
        addAll(c);
    }
    /*
    	利用指定的容量和加载因子创建一个新的HashSet对象,
    	传递的参数作为HashMap的参数:代表容量和加载因子
    */
    public HashSet(int initialCapacity, float loadFactor) {
          
          
        map = new HashMap<>(initialCapacity, loadFactor);
    }
    /*
    	利用指定的参数创建一个新的HashSet对象,
    	实际底层是以此参数创建一个HashMap对象,同时采用默认的
    	加载因子
    */
    public HashSet(int initialCapacity) {
          
          
        map = new HashMap<>(initialCapacity);
    }
    
  3. Implementation of common methods in HashSet:

    /*
    	往 HashSt中添加一个元素:添加成功-true;不成功-false
    	
    	实际底层调用 HashMap中的put方法,将要添加的元素作为键,
    	用字段中定义的虚拟Object对象作为值,进行存储,当存储的
    	key在HashMap中已经存在(哈希值相同,同时调用equals方法
    	返回值为true)时,key不改变,但是会用新值覆盖旧值,但是
    	每一次添加时,value都是用的同一个虚拟 Object对象。
    	
    	所以:向HashSet中添加元素时,如果此元素已经存在(判断
    	依据:哈希码值相同,同时equals方法的返回值为true),集合
    	中的元素不会发生改变,这也满足了Set的不重复性。
    	
    	但是需要注意的就是,判别元素在HashSet中是否存在,在put
    	方法中调用了 hashCode方法 和 equals方法进行判断,所以
    	为了保证HashSet的元素不重复性,需要做到以下两点:
    	(1) 覆盖 hashCode方法
    	    原则:内容相同的对象返回相同的哈希码值,为了提高效率
    	         内容不同的对象,尽可能的返回不同的哈希码值
    	(2) 覆盖equals方法
    	    原则:内容相同的对象,结果返回true
    */
    public boolean add(E e) {
          
          
        return map.put(e, PRESENT)==null;
    }
    /*
    	返回此Set中的元素个数:
    	实际底层调用了Map中的size方法返回Entry的数量,返回值
    	代表Set集合中元素的个数
    */
    public int size() {
          
          
        return map.size();
    }
    /*
    	判断HashMap中是否为空:
    	实际底层调用HashMap的isEmpty方法判断是否为空
    */
    public boolean isEmpty() {
          
          
        return map.isEmpty();
    }
    /*
    	判断HashSet中是否包含某一个元素:
    	底层实际调用HashMap的containsKey方法,判断
    	HashMap中是否包含此键
    */
    public boolean contains(Object o) {
          
          
        return map.containsKey(o);
    }
    /*
    	从HashSet中删除元素:
    	实际底层调用的是 HashMap的 remove方法
    */
    public boolean remove(Object o) {
          
          
        return map.remove(o)==PRESENT;
    }
    /*
    	将HashMap中的元素进行清除:
    	实际底层调用 Map中的remove方法
    */
    public void clear() {
          
          
        map.clear();
    }
    

    Application layer analysis: According to the analysis of the put method, when storing key-value pairs in the HashMap, only the key is considered, and the value is not considered at all. It only calculates and determines the storage location of each key-value pair (Entry) based on the key. To ensure the non-repeatability of the keys, you need to make the keys with the same content have the same storage location, so that the if condition is established during the for loop (the equals method needs to be called in this process), so that the duplicate keys correspond to Value, the new value replaces the old value; but if each key-value pair is stored, the same storage subscript is obtained, so that the linked list corresponding to the same subscript of the array will be very long, and the equals method needs to be called for each storage Whether the content of the specific comparison key is the same will reduce the storage efficiency. Therefore, in order to improve the efficiency, a different storage subscript is given to the key with different content as much as possible, so that the elements in the HashMap can be distributed as evenly as possible, that is, each location One element as possible.

    Therefore, if the element of a custom type is stored in the HashSet (as the key of the HashMap), you need to override the hashCode method and the equals method:

    (1) The principle of covering the hashCode method:

    ​ a. Must ensure that elements with the same content return the same hash code value

    ​ b. In order to improve efficiency, as much as possible to return different hash code values ​​for elements with different contents

    (2) Equals method: objects with the same content return true.

Four, related summary

The underlying nature of HashSet is HashMap, so first understand the underlying implementation principle of HashMap, and then learn about HashSet

The underlying implementation principle.

Guess you like

Origin blog.csdn.net/Java_lover_zpark/article/details/102969219
Recommended