Compilation of JavaSE knowledge points---Collection

Insert image description here


1. What is the difference between arrays and sets?


In Java, arrays and collections are both containers and can be used to store multiple data. There are some differences in their use:

1. Type : Array is the most basic data structure in Java. It is stored linearly in memory and can be used to store basic data types (such as int, double, etc.) and reference types (such as objects). Collection (Collection) is an interface in Java. It is a container for storing objects and can only store reference type data.

// 数组
int[] arr1 = new int[5]; //int型数组
String[] arr2 = new String[5]; //对象数组

// 集合对象
List<String> list = ArrayList<>(); //存储Stirng类型数据,如果不指定泛型,它默认类型为object

2. Length : The length (size) of the array needs to be specified during initialization. Once defined, its length cannot be changed. The length of the collection is variable, and methods such as add() and remove() are provided to operate data; (dynamic expansion and contraction)

int[] arr = new int[5];  //数组长度为5

3. Operation data : Arrays can directly access and modify elements through indexes, with high access efficiency and time complexity of O(1). Collection classes usually provide a series of methods to operate elements, such as adding, deleting, searching, etc. Access to elements requires method calls.

// 1.数组通过索引方式操作数据
int[] arr = new int[5]; //int类似数组,默认值为0
// 添加数据
arr[1]=18;
// 覆盖数据
arr[1]=20;
// 访问数据
System.out.println(arr[1]);//20
// 获取数组长度
System.out.println(arr.length);
  
// 2.集合通过方法调用方式操作数据
List<String> list = ArrayList<>();
// 添加数据
list.add("张三");
list.add("李四");
list.add("王五");
// 删除数据
list.remove(2);
// 获取数据
System.out.println(list.get(1));//李四
// 获取集合中元素的个数
System.out.println(list.size());//2	

4. Memory allocation method : Arrays are stored continuously in memory, and the memory space occupied is fixed. The elements in the collection class can be stored discontinuously in memory, and memory space can be dynamically allocated as needed.

5. Functional extensibility : The collection class provides a wealth of methods and functions, such as sorting, searching, traversing, etc., which can easily perform various operations. Arrays have relatively few functions, and you need to write your own code to implement the corresponding functions.

6. Generics : Collections support generics, which can specify the type of stored elements, providing better type safety. (Avoid type conversion issues)


summary:

  • Arrays and collections are containers that can store multiple pieces of data.
  • The length of an array is fixed and cannot be changed once created.
  • The length of the collection is variable and elements can be added or removed dynamically.
  • Arrays can store basic type data and reference type data, while collections can only store reference type data. If you want to store basic type data, you need to use the corresponding wrapper class.
  • Arrays can directly access elements through indexing, while collections usually operate and access elements through method calls.

In general, arrays are suitable for situations where the length is fixed and the element types are simple, while sets are suitable for situations where the length is variable and the element types are complex.

Single column collection:

Snipaste_2023-08-07_10-47-20

Two-column collection:

Snipaste_2023-08-07_11-46-15


2. What is the difference between List, Set and Map?


List, Set and Map are commonly used collection interfaces in Java. They have the following differences:

  • List is an ordered collection that can contain repeated elements. It allows access to elements in a collection in insertion order and can operate based on index position. Common implementation classes include ArrayList, LinkedList and Vector.

  • Set is a collection that does not allow duplicate elements. It does not guarantee the order of elements, i.e. it does not store elements in insertion order. Common implementation classes include HashSet, TreeSet and LinkedHashSet.

    • HashSet: Based on hash table implementation, the order of elements is not guaranteed.
    • TreeSet: Based on red-black tree implementation, sorting according to the natural order of elements or a specified comparator.
    • LinkedHashSet: Based on hash table and linked list implementation, maintaining the insertion order of elements.
  • Map is a collection of key-value pairs, each key is unique. It allows corresponding values ​​to be accessed and manipulated via keys. Common implementation classes include HashMap, TreeMap, LinkedHashMap, and ConcurrentHashMap.

    • HashMap: Based on hash table implementation, the order of key-value pairs is not guaranteed.

    • TreeMap: Based on red-black tree implementation, sorted according to the natural order of keys or a specified comparator.

    • LinkedHashMap: Based on hash table and doubly linked list implementation, maintaining the insertion order of key-value pairs.

    • ConcurrentHashMap: Thread-safe Map.

ConcurrentHashMap:

ConcurrentHashMap is located in the juc package, which is a thread-safe version of the hash table and improves HashMap. Compared with HashMap, ConcurrentHashMap provides better concurrency performance and thread safety in a multi-threaded environment, mainly through the following aspects:

  • Segment lock: ConcurrentHashMap divides the entire hash table into multiple segments (Segments), and each segment maintains an independent hash table. Different threads can access different segments simultaneously, thereby improving concurrency performance. Each segment has its own lock, and it is only required to lock when the segment is modified. This can reduce the granularity of the lock and improve concurrency performance.
  • CAS operation: ConcurrentHashMap uses CAS (Compare and Swap) operation to ensure thread safety. CAS is a lock-free atomic operation that can implement atomic operations on shared variables without using locks.
  • volatile modifier: ConcurrentHashMap uses the volatile modifier to ensure visibility. Variables modified by volatile are visible to all threads. When one thread modifies the value of a volatile variable, other threads can immediately see the latest value.

The usage of ConcurrentHashMap is similar to HashMap. Elements can be manipulated through put, get, remove and other methods. But it should be noted that although ConcurrentHashMap is thread-safe, in some cases, additional synchronization measures are still required to ensure consistency.


3. What are the differences between ArrayList, LinkedList and Vector?


1. Data structure level:

  • The bottom layer of ArrayList is implemented based on dynamic arrays and is suitable for random access to elements.
  • The underlying layer of LinkedList is based on a doubly linked list, which is suitable for frequent insertion and deletion of data.
  • Vector is similar to ArrayList. The bottom layer is also implemented based on dynamic arrays. Most methods of Vector are thread synchronized. (JDK1.0 was born)

image-20230807164540707

2. Thread safety level:

  • Both ArrayList and LinkedList are unsafe in multi-threaded scenarios (do not provide built-in synchronization mechanism) and require external synchronization (synchronization method or lock).

    • How to ensure the thread safety of ArrayList: https://blog.csdn.net/qq_44544908/article/details/129020075
  • Vector is thread-safe because most of its methods are synchronized, but is not as performant as ArrayList and LinkedList.

3. Performance level:

  • ArrayList has good performance in query operations because its underlying implementation is based on dynamic arrays, elements have indexes, and element memory space is continuously allocated. The time complexity of query operations is O(1). Without expansion, the performance of adding elements at the tail is also very good, and the time complexity is O(1). However, when ArrayList inserts and deletes elements in the middle or head of the list, a shift operation is required, so the performance is poor. , the time complexity is O(n), n is the length of the list.
  • LinkedList has good performance in insertion or deletion operations (double linked list), but poor performance in query operations.
  • Vector has poor performance due to synchronization and is not recommended for use in non-thread-safe environments.

Example: Use of ArrayList, LinkedList, and Vector

package cn.z3inc.list;

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

/**
 * 测试ArryList、LinkedList、Vector的使用
 *
 * @author 白豆五
 * @version 2023/8/7
 * @since JDK8
 */
public class ListDemo1 {
    
    
    public static void main(String[] args) {
    
    
        // ArrayList
        List<String> arrayList = new ArrayList<>();
        Collections.addAll(arrayList, "a", "b", "c", "d", "e", "f", "g"); // 批量添加
        System.out.println("arrayList = " + arrayList); // arrayList = [a, b, c, d, e, f, g]
        arrayList.remove(0);
        System.out.println("arrayList = " + arrayList); // arrayList = [b, c, d, e, f, g]

        // LinkedList
        List<String> linkedList = new ArrayList<>();
        Collections.addAll(linkedList, "a", "b", "c", "d", "e", "f", "g"); // 批量添加
        System.out.println("linkedList = " + linkedList); // linkedList = [a, b, c, d, e, f, g]
        linkedList.remove(0);
        System.out.println("linkedList = " + arrayList); // arrayList = [b, c, d, e, f, g]

        // Vector
        List<String> vector = new Vector<>();
        Collections.addAll(vector, "a", "b", "c", "d", "e", "f", "g"); // 批量添加
        System.out.println("vector = " + vector); // vector = [a, b, c, d, e, f, g]
        vector.remove(0);
        System.out.println("vector = " + vector); // vector = [b, c, d, e, f, g]
    }
}

operation result:

image-20230807173955141

Section: The bottom layer of ArrayList is based on dynamic array implementation, which is suitable for random access to traverse elements and add elements at the end, but the performance is poor when inserting and deleting in the middle; the bottom layer of LinkedList is based on doubly linked list implementation, which is suitable for insertion and deletion operations, but the performance of random access is poor. Poor; Vector is similar to ArrayList but is thread-safe and has poor performance. It is generally used in a multi-threaded environment.


4. Fast failure mechanism of Java collections [fail-fast]


1. Concept:

  • "fail-fast" is an error detection mechanism for Java collection classes. When multiple threads operate on the contents of the same collection, a fail-fast event may occur. For example, when a thread traverses a collection through an iterator, if other threads make structural modifications to the collection (adding or deleting elements), a ConcurrentModificationExceptionconcurrent modification exception will be thrown.

  • The "fail-fast" mechanism can detect problems in time and prevent subsequent operations from continuing based on incorrect data, but it does not guarantee thread safety. Because it does not prevent other threads from modifying the collection, it can only be detected when accessed.

2. Principle:

  • The "fail-fast" mechanism is implemented by recording the number of times the collection has been structurally modified. There is a field in each collection modCountthat records the number of modifications to the collection. Whenever a structural modification is made to the collection, modCount1 is added. When iterating, the iterator stores a expectedModCountvalue that records the number of times the iterator modifies the collection.

  • During the iteration process, each time the hasNext()and next()method is called, it will be judged whether the collection has been modified by comparing the expectedModCount value of the iterator and the modCount value of the collection. If the two are not equal, it means that other threads have modified the collection during the iteration process. , ConcurrentModificationException will be thrown immediately.

image-20230807181905133 image-20230807183149494


Example: Testing concurrent modification exceptions

package cn.z3inc.list;

import java.util.ArrayList;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;

/**
 * 测试并发修改异常
 *
 * @author 白豆五
 * @version 2023/8/7
 * @since JDK8
 */
public class ListDemo2 {
    
    
    public static void main(String[] args) {
    
    
        List<String> list = new ArrayList<>();
        Collections.addAll(list, "a", "b", "c", "d", "e");
        // 获取list的迭代器
        Iterator<String> it = list.iterator();

        // 遍历
        //it.hasNext()判断是否有下一个元素
        //it.next()获取元素并将指针向后移动一位
        while (it.hasNext()) {
    
    
            System.out.println(it.next());
            list.add("f");
        }
    }
}

operation result:

image-20230807183941683


5. Common methods of List interface


Add elements:

  • boolean add(E e): Add an element to the end of the list.
  • add(int index, E element): Insert an element into the specified position in the list.
  • addAll(int index,Collection<? extends E> c): Insert all elements in another collection into the current list at the specified position.

Get elements:

  • E get(int index): Get the element at the specified index in the list.

  • int indexOf(Object o): Get the index of the first occurrence of the specified element in the list.

  • int lastIndexOf(Object o): Gets the index of the last occurrence of the specified element in the list.

Delete elements:

  • E remove(int index): Delete the element at the specified index from the list, and the return value is the deleted element.
  • boolean remove(Object obj): Remove the specified element from the list (only the first element that appears in the list is deleted).

Modify elements:

  • E set(int index,E element): Modify the element at the specified index in the list, and the return value is the modified element.

Other methods:

  • int size(): Returns the number of elements in the list.
  • boolean isEmpty(): Used to determine whether the list is empty.
  • boolean contains(Object o): Used to determine whether the list contains the specified element.
  • void clear(): Clear all elements in the list.
  • Object[] toArray(): Convert the list into an array
  • Iterator<E> iterator(): Gets the iterator used to traverse the list.

6. Three ways to traverse List


Collection supports two traversal methods:

  • Iterator.
  • Enhance the for loop.

The List interface inherits the Collection interface, so the above two traversal methods are also applicable to List. In addition to times, the List collection also supports for loop + get (index) traversal.

  • Iterator.
  • Enhance the for loop.
  • for loop + get(index).

Method 1: Use iterator to traverse

Iterator is a design pattern that provides a unified way to iterate over the elements in a collection without exposing the details of the underlying data structure. By using iterators, we can access each element in the collection in sequence. The iterator pattern encapsulates the internal structure of the collection, making the traversal process simpler, safer, and more flexible.

The iterator pattern is widely used in Java. For example, collection classes (such as List, Set) and Map classes provide iterators to traverse elements. At the same time, we can also customize the iterator interface to support traversal of custom data structures.

Example:

package cn.z3inc.list;

import org.junit.Test;

import java.util.*;

/**
 * List集合3种遍历方式
 *
 * @author 白豆五
 * @version 2023/8/7
 * @since JDK8
 */
public class ListNBForDemo1 {
    
    
    @Test
    public void testIterator() {
    
    
        // 1.创建list集合
        List<String> list = new ArrayList<>();
        Collections.addAll(list, "a", "b", "c");
        
        // 2.获取list集合的迭代器对象
        Iterator<String> it = list.iterator();

        // 3.遍历
        // hasNext() 判断是否有下一个元素
        // next() 获取下一个元素
        while (it.hasNext()) {
    
    
            String item = it.next();
            System.out.println(item);
        }
    }
}

operation result:

image-20230807222730158

  • list.iterator(): Get the iterator object of the list collection
  • hasNext(): Determine whether there is a next element
  • next(): Get the next element

Method 2: Use enhanced For to traverse (foreach)

The underlying implementation of using foreach to traverse an array is an ordinary for loop, while the underlying implementation of using foreach to traverse a collection is an iterator.

public class ListNBForDemo2 {
    
    
    public static void main(String[] args) {
    
    
        List<String> list = new ArrayList<>();
        Collections.addAll(list, "a", "b", "c");

        // 使用增强for遍历list集合
        for (String item : list) {
    
    
            System.out.println(item);
        }
        
		// 简写
        //list.forEach(item -> System.out.println(item));
        //list.forEach(System.out::println);
    }
}

operation result:

image-20230807223845773

View the source code by compiling:javac -encoding utf-8 xxx.java

image-20230807224120183

image-20230807224305730

image-20230807224342613


Method 3: for loop + get (index) method traversal

public class ListNBForDemo2 {
    
    
    public static void main(String[] args) {
    
    
        List<String> list = new ArrayList<>();
        Collections.addAll(list, "a", "b", "c");

        for (int i = 0; i < list.size(); i++) {
    
    
            System.out.println(list.get(i));
        }
    }
}

7. ArrayList expansion mechanism


In Java, ArrayList is implemented based on dynamic arrays, which can automatically expand according to demand. When we add elements to the ArrayList, if the current capacity is not enough to accommodate the new elements, the ArrayList will automatically expand.

1. Initial capacity:

We know that if the length is not specified when creating an ArrayList object, the default initial capacity is 10.

addIn fact, when creating an empty ArrayList object, an array with a length of 10 is not initialized immediately, but an empty array is defined. An array with a length of 10 will not be initialized until we call the method for the first time to add elements.

Therefore, when we create an empty ArrayListobject, it will not immediately occupy the space of 10 elements, but will be dynamically expanded when the elements are actually needed to be stored.

image-20230808213930418

2023-08-08 21 28 36

This lazy initialization method saves memory space because ArrayListit is not always necessary to pre-allocate a fixed-size array when creating an object. The array is initialized and expanded only when the elements are actually needed to be stored. (If the capacity has been determined and the capacity is greater than 10, you can call the second constructor of ArrayList to avoid frequent expansion)


2. Expansion process:

When the number of elements in ArrayList exceeds the current capacity, expansion operations are required. The expansion operation of ArrayList is growimplemented through a private method named.

The grow() method first calculates the new capacity size, which is 1.5 times the old capacity (half of the old capacity plus the old capacity). Then, Arrays.copyOfcreate a new array via the method and copy the elements from the old array to the new array.

image-20230809093404148

// 扩容机制
private void grow(int minCapacity) {
    
    
    // 老容量长度
    int oldCapacity = elementData.length;
    // 新容量长度 = 老容量长度+(老容量长度/2),右移一位相当于除以2
    // 先扩容1.5倍,当新容量长度还是不够用,则直接使用所需要的长度minCapacity作为数组的长度。
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // 数组拷贝
    elementData = Arrays.copyOf(elementData, newCapacity);
}
  • ensureCapacityInternal()Expansion timing: It is triggered by calling the method in the add method , and the expansion operation will only occur when the number of elements we add exceeds the current capacity.

  • Expansion performance: The expansion operation will take a certain amount of time, requiring frequent creation of new arrays, copying elements, and destroying old arrays. ensureCapacityTherefore, if we know the number of elements to be stored in advance, we can manually set the initial capacity of ArrayList through the constructor or method to avoid frequent expansion.

The expansion mechanism of ArrayList can dynamically adapt to changes in data volume, but the expansion operation will consume a certain amount of time and resources. Therefore, when using ArrayList, we should try to avoid frequent expansion to improve performance and efficiency.


8. How does HashSet ensure that data is unique?


In daily development, if you use HashSet to store custom objects and want to ensure the uniqueness of the data, the class to which the object belongs must override the equals() and hashCode() methods.

1. Internal structure:

The bottom layer of HashSet is implemented based on HashMap (which can also be understood as a hash table). It actually stores the key of HashMap. When we add an element to a HashSet, we actually store the element as the key of the HashMap, and the value is a fixed Object instance.

image-20230809174642439


2. hashcode() method:

When we add an element to the HashSet, the hashCode() method of this element is first called to calculate the hash value. This hash value determines the location where the element is stored within the HashSet. If there is no element with the same hash value at the position, then the element is stored in the HashSet; if there is an element with the same hash value at the position, a hash conflict (hash collision) will occur, then the position will Use a linked list or a red-black tree to manage elements with the same hash value.


3. equals() method:

  • In order to solve the hash conflict problem, the equals() method of the element will be further called to compare with all elements at that position. (Whether the comparison objects are the same)
  • Only when hashCode() is the same and equals() returns true, HashSet will consider the two objects to be the same and will not store them repeatedly.

Insert image description here


4. Why is it designed this way:

  • hashCode() is a quick way to tell whether two objects are likely to be the same, because calculating the hash value is relatively fast, but it may not be accurate.
  • The equals() method can accurately determine whether two objects are truly the same, but its execution may be relatively slow. (because it requires more comparison operations)

This design can quickly locate elements in most cases, and only need to execute the equals() method for a more precise comparison when a hash collision occurs.


Example: Test HashSet data uniqueness

package cn.z3inc.set;

import java.util.HashSet;
import java.util.Objects;

/**
 * 测试HashSet数据唯一性
 
 * @author 白豆五
 * @version 2023/8/10
 * @since JDK8
 */
public class HashSetDemo01 {
    
    
    public static void main(String[] args) {
    
    
        HashSet<Person> set = new HashSet<>();
        set.add(new Person("白豆五",18));
         //即使是两个不同的对象,但它们的内容相同,HashSet也会认为它们是相同的,不进行存储
        set.add(new Person("白豆五",18));
        System.out.println(set);
    }
}

class Person{
    
    
    private String name;
    private int age;

    public Person() {
    
    
    }

    public Person(String name,int age){
    
    
        this.name = name;
        this.age = age;
    }

    public String getName() {
    
    
        return name;
    }

    public void setName(String name) {
    
    
        this.name = name;
    }

    public int getAge() {
    
    
        return age;
    }

    public void setAge(int age) {
    
    
        this.age = age;
    }

    @Override
    public boolean equals(Object o) {
    
     // 重写equals方法
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Person person = (Person) o;
        return age == person.age && Objects.equals(name, person.name);
    }

    @Override
    public int hashCode() {
    
     // 重写hashCode方法
        return Objects.hash(name, age);
    }

    @Override
    public String toString() {
    
    
        return "Person{" +
                "name='" + name + '\'' +
                ", age=" + age +
                '}';
    }
}

image-20230810110510476

Note: When we override the equals() method, we must override the hashCode() method to maintain the contract between the two. Otherwise, a HashSet might store two equal objects when the two objects are equal but have different hash codes.


9. How is HashMap implemented?


1. Basic concepts of HashMap:

The bottom layer of HashMap is based on a hash table (also called a hash table) to store key-value pairs. The structure of the hash table is different in different JDK versions. HashMap can complete operations such as insertion, deletion and positioning in O(1) time complexity.

HashMap inheritance relationship:

image-20230810205439281

Features of HashMap:

  • The key has hash characteristics: hash table, array + linked list/red-black tree.
  • The query speed is very fast, and the addition and deletion speed is not slow either.
  • The key must be unique: the class to which the key belongs must override the hashCode and equals methods. If the key is repeated, the old value will be overwritten.
  • Keys are out of order: the order of depositing and withdrawing is not guaranteed to be consistent.
  • The key has no index: the key cannot be obtained through indexing.
  • Allows storing null keys and null values. (You need to avoid null pointer exceptions when using it)
  • Threads are unsynchronized and unsafe, but efficient.

HashMap constants:

/**
 * 缺省 table数组初始大小为16, 2的4次方
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

/**
 * table数组的最大容量,1073741824 (10亿)
 */
static final int MAXIMUM_CAPACITY = 1 << 30;

/**
 * 缺省负载因子大小,0.75
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

/**
 * 树化阈值8
 */
static final int TREEIFY_THRESHOLD = 8;

/**
 * 树降级成为链表的阈值6
 */
static final int UNTREEIFY_THRESHOLD = 6;

/**
 * 树化的另一个参数,当table数组的长度大于64时,才会允许树化
 */
static final int MIN_TREEIFY_CAPACITY = 64;

HashMap member variables:

/**
 * 哈希表(散列表)
 */
transient Node<K, V>[] table;

/**
 * 存放所有键值对的集合
 */
transient Set<Map.Entry<K, V>> entrySet;

/**
 * 哈希表中元素个数
 */
transient int size;

/**
 * 当前哈希表结构的修改次数(用于实现fail-fast机制)
 */
transient int modCount;

/**
 * 扩容阈值,当哈希表中元素个数大于threshold时,就会触发扩容(提高查找性能)
 * threshold = capacity(数组长度) * loadFactor(0.75)
 */
int threshold;

/**
 * 负载因子
 */
float loadFactor;

HashMap construction method:

/**
 * HashMap的有参构造方法
 *
 * @param initialCapacity 初始容量大小
 * @param loadFactor      负载因子
 */
public HashMap(int initialCapacity, float loadFactor) {
    
    
    // 这3个if判断都是校验参数的合法性
    // 如果初始容量小于0,则抛出异常
    if (initialCapacity < 0) {
    
    
        throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);
    }
    // 如果初始容量大于最大容量(10个亿那个数),则将初始容量设置为最大容量
    if (initialCapacity > MAXIMUM_CAPACITY) {
    
    
        initialCapacity = MAXIMUM_CAPACITY; // 限制数组最大长度为10个亿那个数
    }

    // 如果负载因子小于等于0或者负载因子是NaN ,则抛出异常
    if (loadFactor <= 0 || Float.isNaN(loadFactor)) {
    
    
        throw new IllegalArgumentException("Illegal load factor: " + loadFactor);
    }
    // 给负载因子、初始容量赋值
    this.loadFactor = loadFactor;
    // 返回一个扩容阈值,扩容阈值一定是2的次方数(如16、 32、 64、 128)
    this.threshold = tableSizeFor(initialCapacity);
}

/**
 * HashMap的有参构造方法
 *
 * @param initialCapacity 初始容量大小
 */
public HashMap(int initialCapacity) {
    
    
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}


/**
 * HashMap的无参构造方法
 */
public HashMap() {
    
    
    this.loadFactor = DEFAULT_LOAD_FACTOR; // 给负载因子设置为0.75
}

Guess you like

Origin blog.csdn.net/qq_46921028/article/details/132152437