Java Collections - Set interface

1 Overview

SetThe definition of the interface is very simple. It is essentially one Collection, but requires that the collection cannot have duplicate elements . In other words, if an attempt is made to add an element Setto a that already exists Setin the , addthe method returns falseand Setthe is itself unchanged.

Java Setprovides several main implementations for interfaces:

  • HashSet: Based on the implementation of the hash table Set, it does not guarantee the iteration order of the collection ; in particular, it does not guarantee that the order is constant.
  • LinkedHashSet: Implemented by hash tables and linked lists HashSet, with predictable iteration order .
  • TreeSet: A tree-based (red-black tree) implementation, sortedSet according to the natural order of the elements , or sorted according to the comparator provided when creating the collection .

2.HashSet

There are several implementations of interfaces in Java Set, but the most commonly used is undoubtedly HashSet. We often use it by default.

HashSetIt is Seta basic implementation of the interface and is widely used in various programs.

Its class diagram is as follows:

According to its source code name, we can see that HashSetit is actually HashMapsupported by , HashMapwhich is based on the implementation of the hash table. This data structure design makes it HashSethave excellent access and search performance .

A hash table is a data structure that provides fast element insertion and lookup operations . In HashSet, the position of the element in the hash table is determined by the hash algorithm. This means that no matter HashSethow many elements are in , the time to determine whether an element exists (and to get it) is roughly constant O(1), which is the HashSetmain source of efficient performance.

HashSetThe criterion for determining the equality of two elements is: hashCode()the return values ​​of the methods of the two objects are equal, and equals()the return results of the methods of the two objects are also equal. hashCode()method is used to determine the position of the element in the hash table , and equals()the method is used to compare the actual value of the element in the event of a hash collision . This means that if you are going to store your own objects in HashSet, you should override these two methods to ensure that they behave as required by HashSet.

Here's a simple example of overriding the hashCode()and equals()methods:

public class MyDate {
    
    
    private int year;
    private int month;
    private int day;

    @Override
    public boolean equals(Object o){
    
    
        System.out.println("调用equals()方法");
        
        // 如果对象地址一样,则认为相同
        if (this == o) return true;
        // 如果参数为空,或者类型信息不一样,则认为不同
        if (!(o instanceof MyDate)) return false;
        // 转换为当前类型
        MyDate myDate = (MyDate) o;
        // 使用 == 比较基本类型,使用 equals 比较引用类型(此处没有必要)
        return year == myDate.year && month == myDate.month && day == myDate.day;
    }

    @Override
    public int hashCode(){
    
    
        System.out.println("调用hashCode()方法");
        
        // Objects类的hash方法返回一个int类型的值,作为哈希值
        return Objects.hash(year, month, day);
    }

    @Override
    public String toString(){
    
    
        return "MyDate{" + "year=" + year + ", month=" + month + ", day=" + day + '}';
    }

    // 省略构造器、getter和setter方法
}

Here's a basic test:

public class TestHashSet {
    
    
    public static void main(String[] args) {
    
    
        // 创建HashSet集合
        HashSet<String> set = new HashSet<>();
        // 添加元素
        set.add("Java");
        set.add("Java"); // 重复元素
        set.add("Python");
        set.add("C");
        // 输出集合(不保证顺序)
        System.out.println(set);

        // 创建HashSet集合
        HashSet<MyDate> set1 = new HashSet<>();
        // 添加元素
        set1.add(new MyDate(2020, 1, 1));
        set1.add(new MyDate(2020, 1, 1)); // 重复元素
        set1.add(new MyDate(2020, 1, 2));

        // 输出集合(不保证顺序)
        System.out.println(set1);
    }
}

Output analysis:

调用hashCode()方法
调用hashCode()方法
调用equals()方法
调用hashCode()方法
[MyDate{year=2020, month=1, day=2}, MyDate{year=2020, month=1, day=1}]

hashCode()From the above results, we found that the method will be called automatically to set a Hash value for the element through the Hash algorithm to determine the storage location in the hash table when the add operation is performed . When adding duplicate elements, hashCode()the method is also first called to set a Hash value for the element through the Hash algorithm. At this time, the hash value is found to already exist, so the equals()method is automatically called for further comparison. If it is determined that it is the same element, the addition operation will not be performed. .

It can be seen that by using HashMapas its internal structure, HashSetthe performance advantage of the hash table is utilized. Not only that, but it also follows a very strong object equality checking strategy. This makes HashSetis an efficient and reliable option for Java collections, both in performance and semantics, making it an ideal choice HashSetfor implementing the interface.Set

3.LinkedHashSet

In Java's collection framework, LinkedHashSetis a special Setimplementation that inherits from HashSetand provides some additional features.

LinkedHashSetis HashSetan extended subclass of , and its class diagram is as follows:

HashSetIt is implemented based on a hash table, which provides excellent element insertion and lookup performance. However, it does not preserve the insertion order of elements, which can be a disadvantage in some scenarios. That's LinkedHashSetwhy was introduced. On the basis of , it HashSetadds two pointer fields beforeand after, which are used to link each element node, thus recording the order of adding elements.

Therefore, LinkedHashSetit is actually a combined structure of a linked list and a hash table. The linked list maintains the insertion order of elements, while the hash table ensures fast element insertion and lookup performance . This structure LinkedHashSetnot only inherits HashSetthe high performance of , but also provides a predictable iteration order .

In terms of insertion performance, due to LinkedHashSetthe need to maintain an additional linked list, its performance is slightly lower HashSet. However, this performance hit is usually acceptable, especially in scenarios where insertion order needs to be preserved.

In terms of iterative access performance, LinkedHashSetit performs very well. Since it maintains a linked list running in insertion order, it provides efficient and stable performance Setwhen traversing all elements of . LinkedHashSetThis makes it a very good choice for applications that require frequent iterations.

Here's a simple use case:

public class LinkedHashSetTest {
    
    
    public static void main(String[] args) {
    
    
        LinkedHashSet<String> set = new LinkedHashSet<>();
        
        // 添加元素
        set.add("Java");
        set.add("Java"); // 重复元素
        set.add("Python");
        set.add("C");

        // 输出集合(保证顺序)
        System.out.println(set); // [Java, Python, C]
    }
}

4.TreeSet

TreeSetIt is an important member of the Java collection framework, which provides a collection with sorting and deduplication as its core features.

TreeSetThe bottom layer is TreeMapimplemented based on , TreeMapand the bottom layer data structure is a red-black tree, a self-balancing binary search tree . Due to the nature of the red-black tree, elements TreeSetcan be inserted, deleted, and searched efficiently, while ensuring the ordering of elements .

Its class diagram is as follows:

TreeSetThe two core features of are deduplication and sorting of elements .

The logic of deduplication mainly depends on the way the elements are compared. TreeSetTwo comparison methods are supported, namely natural sorting and custom sorting.

  • For natural sorting, TreeSetcollection elements are required to implement Comparablethe interface, and override compareTothe method. When TreeSeta new element is added, the element's compareTomethod is called to compare it with existing elements. If the return value is 0, indicating that the two elements are equal, the new element will not be added to the TreeSet.
  • For custom sorting, you need to specify an object that implements the interface TreeSetwhen creating it . ComparatorWhen a new element is added, the method of TreeSetis called for element comparison. Likewise, if the method returns 0, no new elements will be added to the .ComparatorcomparecompareTreeSet

For sorting, TreeSettwo ways of natural sorting and custom sorting are supported:

  • Natural ordering: TreeSetRequires collection elements to implement Comparablethe interface and override compareTothe method. compareToThe return value of the method determines the sort order of the elements.
  • Custom sorting: TreeSetWhen creating an object, a object can be passed in through the constructor Comparator. ComparatorThe method in the interface comparewill be used to sort the elements.

Here is a simple practical example:

public class TreeSetTest {
    
    
    public static void main(String[] args) {
    
    
        /*
         * 默认情况下采用自然排序,会调用 Comparable 接口中的 compareTo 方法进行比较
         * 1.对于字符串:按照 Unicode 编码值的大小进行比
         * 2.对于自定义类型:需要实现 Comparable 接口,重写 compareTo 方法
         * 3.对于整形:按照数值大小进行比较
         * 4.对于浮点型:按照数值大小进行比较
         * 5.对于布尔型:false < true
         */
        TreeSet<String> set = new TreeSet<>();

        // 添加元素
        set.add("Java");
        set.add("Java"); // 重复元素
        set.add("Python");
        set.add("C");
        set.add("C++");
        set.add("Go");
        set.add("C#");

        // 输出集合
        System.out.println(set); // [C, C#, C++, Go, Java, Python]
    }
}
public class TreeSetTest02 {
    
    
    public static void main(String[] args) {
    
    
        // 如果是定制排序,需要在创建 TreeSet 时传入 Comparator 接口的实现类对象,重写 compare 方法去自定义排序规则
        TreeSet<String> set = new TreeSet<>((o1, o2) -> {
    
    
            // 按照字符串长度比较
            return o1.length() - o2.length();
        });

        // 添加元素
        set.add("Java");
        set.add("Python");
        set.add("C");
        set.add("C++");
        set.add("Go");

        // 输出集合
        System.out.println(set); // [C, Go, C++, Java, Python]
    }
}

5. Select the appropriate Set implementation

Which Setimplementation to choose depends mainly on your specific needs:

  • If you just need a collection with no duplicate elements and don't care about the order of the elements, then HashSetis a good choice. It provides constant-time basic operations (add, remove, and contains).
  • If you care about the insertion order of elements, then LinkedHashSetis a better choice. It HashSetmaintains the insertion order of elements using linked lists on the basis of .
  • If you need a sorted collection, then TreeSetis the best choice. It uses a red-black tree to store elements, providing an ordered collection view.

6. Summary

Here is a simple summary table:

characteristic HashSet LinkedHashSet TreeSet
order of elements out of order insertion order orderly
allownull yes yes yes
performance high medium lower
based on HashMap LinkedHashMap TreeMap
special function none record insertion order to sort

Guess you like

Origin blog.csdn.net/ly1347889755/article/details/130905250