Java collection container interview questions

gather

what is collection

A collection is a container for storing data and can only store reference types, so a collection is very suitable for storing objects. And the collection is variable in length, so it is suitable to use the collection when the number of objects is uncertain

Collection Features

1. Collections can only store reference data types. Collections are used to store objects.

2. Arrays can be used if the number of objects is determined, and collections can be used if the number of objects is uncertain. Because collections are variable length.

The difference between collections and arrays

1. Arrays are of fixed length; collections are of variable length.

2. Arrays can store basic data types as well as reference data types; collections can only store reference data types.

3. The elements stored in the array must be of the same data type; the objects stored in the collection can be of different data types.

Benefits of Using the Collections Framework

  • Reduce development work, it provides almost all common types of collections and useful methods for iterating and manipulating data, so, we can focus more on business logic instead of designing our collection api
  • Improve code quality, using well-tested core collection classes can improve the quality of our programs, improve the robustness and usability of the code
  • Reusability and Interoperability
  • Reduce maintenance costs By using the collection classes that come with the JDK, you can reduce code maintenance costs

Sao Dai understands: In fact, the above advantages are due to the fact that if we use the collection that comes with jdk, our coding will be more standardized, such as the ArrayList collection, which is a unified and standardized collection that comes with jdk, so the method of this collection, specifically The use is the same no matter where, but if each of us designs our own collection, first of all, it will take a lot of time, and secondly, the method names of the collections designed by each person may be different, so you can use it yourself, if others also If you use this, you have to see what the function of your method name is. If he also writes a collection like yours and defines a different method name, it will be very messy, so use the one that comes with jdk The collection will be unified and standardized, not as messy as above

What are the commonly used collection classes?

The Map interface and the Collection interface are the parent interfaces of all collection frameworks:

The sub-interfaces of the Collection collection have three sub-interfaces: Set, List, and Queue

The implementation classes of the Set interface mainly include: HashSet, TreeSet, LinkedHashSet, etc.

The implementation classes of the List interface mainly include: ArrayList, LinkedList, Stack, and Vector, etc.

The implementation classes of the Queue interface mainly include: BlockingQueue, Deque, etc.

The implementation classes of the Map interface mainly include: HashMap, TreeMap, Hashtable, ConcurrentHashMap, etc.

Sao Dai's understanding: Note that there is no TreeList

The underlying data structure of the collection framework

Collection collection mainly has two interfaces: List and Set

1、List

ArrayList : ArrayList is implemented based on an array, and its underlying data structure is a mutable array. When inserting elements, if the array is full, it needs to be expanded. The way to expand is to create a new array, copy the elements in the original array to the new array, and then insert the new elements into the new array.

LinkedList : LinkedList is implemented based on a linked list, and its underlying data structure is a doubly linked list. When inserting an element, you only need to modify the corresponding pointer in the linked list, and you don't need to copy and expand the array like ArrayList.

Vector : Vector is a thread-safe List implementation class. Its underlying data structure is similar to ArrayList, and it is also a mutable array. The difference is that Vector's methods are all synchronized, so thread safety can be guaranteed.

Stack : Stack is implemented based on Vector, which is a last-in-first-out (LIFO) data structure that supports push and pop operations.

Sao Dai understands: Vector and Stack are both thread-safe classes

2、Set

HashSet (unordered, unique) : based on HashMap, the default constructor is to construct a HashMap with an initial capacity of 16 and a load factor of 0.75. A HashMap object is encapsulated to store all the collection elements. All the collection elements put into the HashSet are actually saved by the key of the HashMap, and the value of the HashMap stores a PRESENT, which is a static Object object.

LinkedHashSet: LinkedHashSet inherits HashSet, and its interior is through LinkedHashMap

to achieve. It is a bit similar to the LinkedHashMap we said before, which is internally implemented based on Hashmap, but there is still a little difference.

TreeSet (ordered, unique): red-black tree (balanced sorted binary tree)

3、Map

HashMap: Before JDK1.8, HashMap is composed of array + linked list. The array is the main body of HashMap, and the linked list mainly exists to solve hash conflicts ("zipper method" to solve conflicts). After JDK1.8, hash conflicts are resolved When the length of the linked list is greater than the threshold (8 by default), the linked list is converted into a red-black tree to reduce the search time

LinkedHashMap: LinkedHashMap inherits from HashMap, so its bottom layer is still based on the zipper hash structure, which is composed of arrays and linked lists or red-black trees. In addition, on the basis of the above structure, LinkedHashMap adds a doubly linked list to maintain the insertion order of key-value pairs. At the same time, by performing corresponding operations on the linked list, the logic related to the access sequence is realized.

HashTable : Composed of array + linked list, the array is the main body of HashMap, and the linked list mainly exists to solve hash conflicts

TreeMap : Red-black tree (self-balancing sorted binary tree)

ConcurrentHashMap: A thread-safe hash table implementation in Java, whose underlying data structure is a segmented lock hash table. Specifically, it divides the entire hash table into multiple segments, each segment is an independent hash table, and each segment has its own lock, so multi-threaded concurrent access can be achieved, improving the Concurrency performance.

The size of each segment can be configured by parameters, the default is 16. In ConcurrentHashMap, each element is stored in an Entry object, which contains key, value and a pointer to the next Entry. Each segment contains an Entry array, and each element of the array is the head node of a linked list, and each linked list stores elements with the same hash value.

When performing operations such as inserting, searching, and deleting, you first need to find the corresponding segment according to the hash value of the key, and then perform operations in this segment. Since each segment has its own lock, different threads can access different segments at the same time, improving concurrency performance.

In short, ConcurrentHashMap is a thread-safe hash table implementation in Java. It uses a segmented lock hash table as the underlying data structure, which can achieve efficient multi-threaded concurrent access.

Which collection classes are thread safe? Which collection classes are thread-unsafe?

Which collection classes are thread safe?

  • Vector: As long as it is a critical operation, the synchronized keyword is added in front of the method to ensure thread safety
  • Hashtable: The synchronized keyword is used, so it is thread-safe compared to Hashmap.
  • ConcurrentHashMap: Using lock segmentation technology to ensure linear safety, it is an efficient but thread-safe collection.
  • Stack: Stack, which is also thread-safe, inherits from Vector.

Which collection classes are thread-unsafe?

  • Hashmap
  • Arraylist
  • LinkedList
  • HashSet
  • TreeSet
  • TreeMap

Reasons for thread unsafety

  • Hashmap : During the put operation of HashMap, if the inserted element exceeds the capacity (determined by the load factor), the expansion operation will be triggered, which is resize, which will re-hash the contents of the original array to the new expansion array. In a multi-threaded environment, other elements are also performing put operations at the same time. If the hash value is the same, it may be represented by a linked list under the same array at the same time, resulting in a closed loop, resulting in an infinite loop when getting, so HashMap is a thread unsafe.
  • Arraylist : When the List object is added, when Arrays.copyOf is executed, a new array object is returned. When threads A, B... enter the grow method at the same time, multiple threads will execute the Arrays.copyOf method and return multiple different elementData objects. If A returns first and B returns later, then List.elementData ==A.elementData, If B also returns at the same time, then List.elementData ==B.elementData, so thread B overwrites the data of thread A, causing the data of thread A to be lost.
  • LinkedList : Similar to the thread safety problem of Arraylist, the thread safety problem is caused by multiple threads writing or reading and writing the same resource at the same time.
  • HashSet : The underlying data storage structure adopts Hashmap, so the thread safety problem that Hashmap will generate will also occur in HashSet.

Java collection's fail-fast mechanism "fail-fast"?

What is the fast failure mechanism "fail-fast"?

The fail-fast mechanism, that is, the fast failure mechanism, is an error detection mechanism in the java collection (Collection).

When multiple threads operate on the content of the same collection, it may appear that while one thread is iterating the collection, another thread modifies the content of the collection, which will generate a fail-fast event and throw a ConcurrentModificationException Abnormal, single-threaded fail-fast events may also occur

For example: when a thread A traverses a collection through an iterator, if the content of the collection is changed by other threads; then when thread A accesses the collection, a ConcurrentModificationException will be thrown and a fail-fast event will be generated. But it should be noted that the fail-fast mechanism does not guarantee that an exception will be thrown under asynchronous modification, it just tries its best to throw it, so this mechanism is generally only used to detect bugs

The appearance scene of fail-fast

The fail-fast mechanism may appear in our common java collections, such as ArrayList and HashMap. Fail fast is possible in both multi-threaded and single-threaded environments.

fail-fast in a single-threaded environment

public static void main(String[] args) {
    List<String> list = new ArrayList<>();
    for (int i = 0 ; i < 10 ; i++ ) {
         list.add(i + "");
    }
    Iterator<String> iterator = list.iterator();
    int i = 0 ;
    while(iterator.hasNext()) {
         if (i == 3) {
              list.remove(3);
         }
         System.out.println(iterator.next());
         i ++;
    }
}

This piece of code defines an Arraylist collection and uses iterators to traverse it. During the traversal process, an element is deliberately removed in a certain step of the iteration. At this time, fail-fast will occur.

public static void main(String[] args) {
    Map<String, String> map = new HashMap<>();
    for (int i = 0 ; i < 10 ; i ++ ) {
         map.put(i+"", i+"");
    }
    Iterator<Entry<String, String>> it = map.entrySet().iterator();
    int i = 0;
    while (it.hasNext()) {
         if (i == 3) {
              map.remove(3+"");
         }
         Entry<String, String> entry = it.next();
         System.out.println("key= " + entry.getKey() + " and value= " + entry.getValue());
         i++;
    }
}

This piece of code defines a hashmap object and stores 10 key-value pairs. During the iterative traversal process, an element is removed using the remove method of the map, resulting in a ConcurrentModificationException being thrown.

In a multi-threaded environment

public class FailFastTest {
    public static List<String> list = new ArrayList<>();
    private static class MyThread1 extends Thread {
          @Override
          public void run() {
               Iterator<String> iterator = list.iterator();
               while(iterator.hasNext()) {
                    String s = iterator.next();
                    System.out.println(this.getName() + ":" + s);
                    try {
	                   Thread.sleep(1000);
	               } catch (InterruptedException e) {
	                   e.printStackTrace();
	               }
               }
               super.run();
          }
    }
    private static class MyThread2 extends Thread {
          int i = 0;
          @Override
          public void run() {
               while (i < 10) {
                    System.out.println("thread2:" + i);
                    if (i == 2) {
                          list.remove(i);
                    }
                    try {
	                   Thread.sleep(1000);
	               } catch (InterruptedException e) {
	                   e.printStackTrace();
	               }
                    i ++;
               }
          }
    }
    public static void main(String[] args) {
          for(int i = 0 ; i < 10;i++){
               list.add(i+"");
            }
          MyThread1 thread1 = new MyThread1();
          MyThread2 thread2 = new MyThread2();
          thread1.setName("thread1");
          thread2.setName("thread2");
          thread1.start();
          thread2.start();
    }
}

Start two threads, one thread 1 iterates the list, and the other thread 2 removes an element during the iteration process of thread 1, and the result is also throwing java.util.ConcurrentModificationException

The above mentioned the case of rapid failure caused by the change of collection structure caused by deletion. If the collection structure changes caused by addition, fast failure will also occur, so I will not give an example here.

Implementation principle analysis

final void checkForComodification() {
    if (modCount != expectedModCount)
        throw new ConcurrentModificationException();
}

It can be seen that this method is the key to judging whether to throw ConcurrentModificationException.

In this code, when modCount != expectedModCount, the exception will be thrown. But at the beginning, the initial value of expectedModCount is equal to modCount by default, why does modCount != expectedModCount appear?

It is obvious that expectedModCount has not been modified anywhere except for the initial value modCount in the entire iteration process, and it is impossible to change, so only modCount may change. Next, let's take a look at when "modCount is not equal to expectedModCount" through the source code, and see how modCount is modified through the source code of ArrayList.

package java.util;
public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess, Cloneable, java.io.Serializable
{
    ...
    // list中容量变化时,对应的同步函数
    public void ensureCapacity(int minCapacity) {
        modCount++;
        int oldCapacity = elementData.length;
        if (minCapacity > oldCapacity) {
            Object oldData[] = elementData;
            int newCapacity = (oldCapacity * 3)/2 + 1;
            if (newCapacity < minCapacity)
                newCapacity = minCapacity;
            // minCapacity is usually close to size, so this is a win:
            elementData = Arrays.copyOf(elementData, newCapacity);
        }
    }
    // 添加元素到队列最后
    public boolean add(E e) {
        // 修改modCount
        ensureCapacity(size + 1);  // Increments modCount!!
        elementData[size++] = e;
        return true;
    }
    // 添加元素到指定的位置
    public void add(int index, E element) {
        if (index > size || index < 0)
            throw new IndexOutOfBoundsException(
            "Index: "+index+", Size: "+size);
        // 修改modCount
        ensureCapacity(size+1);  // Increments modCount!!
        System.arraycopy(elementData, index, elementData, index + 1,
             size - index);
        elementData[index] = element;
        size++;
    }
    // 添加集合
    public boolean addAll(Collection<? extends E> c) {
        Object[] a = c.toArray();
        int numNew = a.length;
        // 修改modCount
        ensureCapacity(size + numNew);  // Increments modCount
        System.arraycopy(a, 0, elementData, size, numNew);
        size += numNew;
        return numNew != 0;
    }
    // 删除指定位置的元素
    public E remove(int index) {
        RangeCheck(index);
        // 修改modCount
        modCount++;
        E oldValue = (E) elementData[index];
        int numMoved = size - index - 1;
        if (numMoved > 0)
            System.arraycopy(elementData, index+1, elementData, index, numMoved);
        elementData[--size] = null; // Let gc do its work
        return oldValue;
    }
    // 快速删除指定位置的元素
    private void fastRemove(int index) {
        // 修改modCount
        modCount++;
        int numMoved = size - index - 1;
        if (numMoved > 0)
            System.arraycopy(elementData, index+1, elementData, index,
                             numMoved);
        elementData[--size] = null; // Let gc do its work
    }
    // 清空集合
    public void clear() {
        // 修改modCount
        modCount++;
        // Let gc do its work
        for (int i = 0; i < size; i++)
            elementData[i] = null;
        size = 0;
    }
    ...
}

We found that whether it is add(), remove(), or clear(), as long as it involves modifying the number of elements in the collection, the value of modCount will be changed.

Next, let's systematically sort out how fail-fast is generated. Proceed as follows:

1. Create a new ArrayList named arrayList.

2. Add content to arrayList.

3. Create a new "thread a", and repeatedly read the value of arrayList through Iterator in "thread a".

4. Create a new "thread b", and delete a "node A" in the arrayList in "thread b".

5. At this time, interesting events will occur.

  • At some point, "thread a" creates an Iterator of arrayList. At this point "node A" still exists in the arrayList, when creating the arrayList, expectedModCount = modCount (assuming their value at this time is N).
  • At a certain point in the process of "thread a" traversing the arrayList, "thread b" is executed, and "thread b" deletes "node A" in the arrayList. When "thread b" executes remove() to delete, "modCount++" is executed in remove(), and modCount becomes N+1 at this time!
  • Thread a" then traverses, when it executes the next() function, it calls checkForComodification() to compare the size of "expectedModCount" and "modCount"; and "expectedModCount=N", "modCount=N+1", so, throw A ConcurrentModificationException is thrown and a fail-fast event is generated.

So far, we have fully understood how fail-fast is generated!

Summary of Implementation Principles

final void checkForComodification() {
    if (modCount != expectedModCount)
        throw new ConcurrentModificationException();
}

There is a checkForComodification method. When multiple threads operate on the same collection, the content of the collection is changed by other threads during the process of a thread accessing the collection (other threads call these methods through add, remove, clear, etc.) modCount will be self-incrementing, that is, modCount++), changing the value of modCount, when modCount != expectedModCount (in the beginning, the initial value of expectedModCount is equal to modCount by default), then a ConcurrentModificationException exception will be thrown, and a fail-fast event will be generated .

Solution:

1. During the traversal process, synchronized is added to all places that involve changing the value of modCount.

1. Use CopyOnWriteArrayList to replace ArrayList

How to ensure that a collection cannot be modified?

You can use the Collections.unmodifiableCollection(Collection c) method to create a read-only collection

Together, any operation that mutates the collection will throw a Java.lang.UnsupportedOperationException.

List list = new ArrayList<>();
list. add("x");
Collection clist = Collections. unmodifiableCollection(list);
clist. add("y"); // 运行时此行报错
System. out. println(list. size());

Do List, Set, Map inherit from Collection interface? What is the difference between List, Set, and Map? What are the characteristics of each of the three interfaces of List, Map, and Set when accessing elements?

1. Are List, Set, and Map inherited from the Collection interface?

Java containers are divided into two categories: Collection and Map. The sub-interfaces of Collection include three sub-interfaces: Set, List, and Queue. We commonly use Set and List, and the Map interface is not a sub-interface of collection. Collection collection mainly has two interfaces: List and Set

2. The difference between List, Set and Map

List : An ordered container (the order in which elements are stored in the collection is the same as the order in which they are taken out), elements can be repeated, multiple null elements can be inserted, and elements have indexes.

The commonly used implementation classes of List interface are ArrayList, LinkedList and Vector.

Set : An unordered (storage and withdrawal order may be inconsistent) container, which cannot store duplicate elements, only allows one null element to be stored, and the uniqueness of elements must be guaranteed.

Common implementation classes of the Set interface are HashSet, LinkedHashSet and TreeSet.

Map: A collection of key-value pairs that stores the mapping between keys, values, and values. The key is unordered and unique; the value does not require order and allows duplication. Map does not inherit from the Collection interface. When retrieving elements from the Map collection, as long as the key object is given, the corresponding value object will be returned.

Common implementation classes of Map: HashMap, TreeMap, HashTable, LinkedHashMap, ConcurrentHashMap

3. What are the characteristics of each of the three interfaces of List, Map, and Set when accessing elements?

When the three interfaces of List, Map, and Set are stored

  • List stores elements with a specific index (ordered storage), and there can be repeated elements
  • The elements stored in Set are unordered and non-repeatable (use the equals() method of the object to distinguish whether the elements are repeated)
  • Map saves the mapping of key-value pairs. The mapping relationship can be one-to-one (key-value) or many-to-one. It should be noted that the keys are unordered and cannot be repeated, and the values ​​​​can be repeated.

When the three interfaces of List, Map, and Set are taken out

  • List takes out elements for loop, foreach loop, Iterator iterator iteration
  • Set takes out elements foreach loop, Iterator iterator iteration
  • Map takes out elements foreach loop, Iterator iterator iteration

What are the ways to traverse the map collection

There are several ways to traverse the Map collection:

1. Use the Iterator iterator to traverse the Map collection. Obtain the Set collection returned by the entrySet() method of Map, then obtain the Iterator iterator through the iterator() method of the Set collection, and finally use the while loop to traverse the elements in the Map collection. The sample code is as follows:


   Map<String, Integer> map = new HashMap<>();
   map.put("A", 1);
   map.put("B", 2);
   map.put("C", 3);
   Iterator<Map.Entry<String, Integer>> iterator = map.entrySet().iterator();
   while (iterator.hasNext()) {
       Map.Entry<String, Integer> entry = iterator.next();
       System.out.println(entry.getKey() + " : " + entry.getValue());
   }

2. Use the for-each loop to traverse the Map collection. By obtaining the Set collection returned by the entrySet() method of Map, and then traversing the elements in the Set collection through for-each loop. The sample code is as follows:


   Map<String, Integer> map = new HashMap<>();
   map.put("A", 1);
   map.put("B", 2);
   map.put("C", 3);
   for (Map.Entry<String, Integer> entry : map.entrySet()) {
       System.out.println(entry.getKey() + " : " + entry.getValue());
   }

3. Traverse the keys or values ​​of the Map collection. Obtain the Set collection returned by the keySet() method of Map, or the Collection collection returned by the values() method, and then traverse the elements in the Set or Collection collection through the for-each loop. The sample code is as follows:


   Map<String, Integer> map = new HashMap<>();
   map.put("A", 1);
   map.put("B", 2);
   map.put("C", 3);
   for (String key : map.keySet()) {
       System.out.println(key + " : " + map.get(key));
   }
   for (Integer value : map.values()) {
       System.out.println(value);
   }

In short, there are many ways to traverse the Map collection, and you need to choose the appropriate way according to your specific needs and scenarios.

The difference between comparable and comparator?

The difference between comparable and comparator?

  • The comparable interface is actually from the java.lang package, which has a compareTo(Object obj) method for sorting, and the comparator interface is actually from the java.util package, which has a compare(Object obj1, Object obj2) method for sorting
  • Comparable is a sorting interface. If a class implements the Comparable interface, it means "this class supports sorting". The Comparator is a comparator, which implements this interface through a class to sort as a comparator.
  • Comparable is equivalent to "internal comparator", and Comparator is equivalent to "external comparator".

respective advantages and disadvantages

  • It is simple to use Comparable, as long as the object that implements the Comparable interface becomes a comparable object directly, but the source code needs to be modified.
  • The advantage of using Comparator is that you don’t need to modify the source code, but implement a comparator separately. When a custom object needs to be compared, you can compare the size by passing the comparator and the object together, and in the Comparator Users can implement complex and general-purpose logic by themselves, so that it can match some relatively simple objects, which can save a lot of repetitive labor.

Introduction and Examples of Comparable and Comparator

Comparable

Comparable is a sorting interface. If a class implements the Comparable interface, it means that the class supports sorting. Lists or arrays of objects of classes that implement the Comparable interface can be automatically sorted by Collections.sort or Arrays.sort. Additionally, objects implementing this interface can be used as keys in an ordered map or collections in an ordered set without specifying a comparator. The interface is defined as follows:

package java.lang;
import java.util.*;
public interface Comparable<T> 
{
    public int compareTo(T o);
}

This interface has only one method, compareTo, which compares the order of this object with the specified object, and returns a negative integer, zero, or a positive integer if the object is less than, equal to, or greater than the specified object.

Now there are two objects of the Person class, how do we compare the size of the two? We can do this by having Person implement the Comparable interface:

public class Person implements Comparable<Person>
{
    String name;
    int age;
    public Person(String name, int age)
    {
        super();
        this.name = name;
        this.age = age;
    }
    public String getName()
    {
        return name;
    }
    public int getAge()
    {
        return age;
    }
    @Override
    public int compareTo(Person p)
    {
        return this.age-p.getAge();
    }
    public static void main(String[] args)
    {
        Person[] people=new Person[]{new Person("xujian", 20),new Person("xiewei", 10)};
        System.out.println("排序前");
        for (Person person : people)
        {
            System.out.print(person.getName()+":"+person.getAge());
        }
        Arrays.sort(people);
        System.out.println("\n排序后");
        for (Person person : people)
        {
            System.out.print(person.getName()+":"+person.getAge());
        }
        
    }
}

Results of the

Comparator

Comparator is a comparison interface. If we need to control the order of a certain class, and the class itself does not support sorting (that is, it does not implement the Comparable interface), then we can create a "comparator of this class" to sort. device" only needs to implement the Comparator interface. In other words, we can create a new comparator by implementing Comparator, and then sort the classes through this comparator. The interface is defined as follows:

package java.util;
public interface Comparator<T>
 {
    int compare(T o1, T o2);
    boolean equals(Object obj);
 }

Notice

1. If a class wants to implement the Comparator interface: it must implement the compare(T o1, T o2) function, but it does not need to implement the equals(Object obj) function.

2. int compare(T o1, T o2) is "comparing the size of o1 and o2". Returns "negative", meaning "o1 is less than o2"; returns "zero", meaning "o1 is equal to o2"; returns "positive", meaning "o1 is greater than o2".

Now if the above Person class does not implement the Comparable interface, how to compare the size? We can create a new class and let it implement the Comparator interface to construct a "comparator".

public class PersonCompartor implements Comparator<Person>
{
    @Override
    public int compare(Person o1, Person o2)
    {
        return o1.getAge()-o2.getAge();
    }
}
public class Person
{
    String name;
    int age;
    public Person(String name, int age)
    {
        super();
        this.name = name;
        this.age = age;
    }
    public String getName()
    {
        return name;
    }
    public int getAge()
    {
        return age;
    }
    public static void main(String[] args)
    {
        Person[] people=new Person[]{new Person("xujian", 20),new Person("xiewei", 10)};
        System.out.println("排序前");
        for (Person person : people)
        {
            System.out.print(person.getName()+":"+person.getAge());
        }
        Arrays.sort(people,new PersonCompartor());
        System.out.println("\n排序后");
        for (Person person : people)
        {
            System.out.print(person.getName()+":"+person.getAge());
        }
    }
}

Results of the

What is the difference between Collectionand Collections?

Collection and Collections are two different concepts in the Java collection framework.

1. Collection is an interface in the Java collection framework. It is the root interface of all collection classes and provides some common methods, such as adding, deleting, traversing, etc.

2. Collections is a tool class in the Java collection framework, which provides a series of static methods for operations on collections, such as sorting, searching, replacing, copying, reversing, etc.

In short, Collection is an interface that represents the basic characteristics and behaviors of collection classes, and Collections is a tool class that provides methods for operating collections.

What are the commonly used methods of Collections

Collections is a tool class in the Java collection framework, which provides a series of commonly used collection operation methods, including sorting, searching, replacing, copying, inverting, etc. Commonly used Collections methods include:

1. sort(List<T> list): Sort the List collection, and the sorting rules are natural order (from small to large).

2. sort(List<T> list, Comparator<? super T> c): Sort the List collection, and the sorting rules are specified by Comparator.

3. binarySearch(List<? extends Comparable<? super T>> list, T key): Searches for the specified element in the ordered List collection, and returns the index value of the element.

4. binarySearch(List<? extends T> list, T key, Comparator<? super T> c): Searches for the specified element in the ordered List collection, returns the index value of the element, and the search rule is specified by the Comparator.

5. reverse(List<?> list): Reverse the elements in the List collection.

6. shuffle(List<?> list): Randomly shuffle the order of elements in the List collection.

7. swap(List<?> list, int i, int j): Exchange two elements at the specified position in the List collection.

8. fill(List<? super T> list, T obj): Replace all elements in the List collection with the specified element.

9. copy(List<? super T> dest, List<? extends T> src): Copy the elements in the src collection to the dest collection.

10. max(Collection<? extends T> coll): Returns the largest element in the Collection collection.

11. min(Collection<? extends T> coll): Returns the smallest element in the Collection collection.

Iterator Iterator

What is iterator Iterator?

Iterator (Iterator) is an interface in the Java collection framework , which is used to traverse the elements in the collection. Iterators provide a general traversal method that can traverse any type of collection, including List, Set, Map, etc. It can realize the one-way traversal of the collection, that is, it can only traverse from front to back, not backwards. During traversal, elements in the collection can be deleted, but elements in the collection cannot be modified.

How to use Iterator?

(1) Iterator() requires the container to return an Iterator. Iterator will be ready to return the first element of the sequence.
(2) Use next() to get the next element in the sequence
(3) Use hasNext() to check if there are more elements in the sequence.
(4) Use remove() to delete the object returned by the last Iterator.next() method.

List list = new ArrayList<>();
Iterator iterator = list.iterator();//list集合实现了Iterable接口
while (iterator.hasNext()) {
    String string = iterator.next();
    //do something
}

What are the advantages of Iterator?

Iterator (Iterator) is an important interface in the Java collection framework, which has the following advantages:

1. Strong versatility: The iterator provides a general traversal method, which can traverse any type of collection, including List, Set, Map, etc.

2. Ease of use: It is very simple to use an iterator to traverse a collection. You only need to call the Iterator method, such as hasNext(), next(), remove(), etc., to complete the traversal.

3. Safe and reliable: Using iterators to traverse collections can ensure thread safety and avoid concurrent access problems in multi-threaded environments.

4. Support for delete operations: Use iterators to traverse the collection to easily delete elements in the collection without considering the index change after deleting the elements.

5. Efficient performance: The performance of using iterators to traverse collections is very efficient, especially for large collections, which can avoid the problem of loading all elements at once.

What are the characteristics of Iterator?

Iterator is an interface in the Java collection framework, which is used to traverse the elements in the collection. Its characteristics are as follows:

1. You can traverse any type of collection, including List, Set, Map, etc.

2. One-way traversal of the collection can be realized, that is, it can only be traversed from front to back, not backwards.

3. The elements in the collection can be deleted during the traversal process, but the elements in the collection cannot be modified, otherwise a ConcurrentModificationEception exception will be thrown.

4. Generics can be used to avoid the trouble of type conversion.

5. Iterator must exist attached to a collection object, and Iterator itself does not have the function of loading data objects.

How to remove elements in Collection while traversing?

The only correct way to remove a Collection while traversing is to use the Iterator.remove() method, as follows:

List list = new ArrayList<>();
Iterator<Integer> it = list.iterator();
//正确的移除方法
while(it.hasNext()){
//设置移除元素的条件
    it.remove();
} 

//一种最常见的错误代码如下
for(Integer i : list){ 
    list.remove(i);//报 ConcurrentModificationException 异常
}

Sao Dai understands: that is, it can only be removed through the remove method of the iterator instance, but not through the remove method of the collection itself. If the remove method of the collection itself is used, the fail-fast mechanism will be triggered to generate a ConcurrentModificationException exception, because One thread is not allowed to modify the Collection while another thread is traversing it.

What is the difference between Iterator and ListIterator?

Both Iterator and ListIterator are interfaces in the Java collection framework, which are used to traverse the elements in the collection. Their main differences are as follows:

1. Different traversal directions: Iterator can only traverse the elements in the collection forward, while ListIterator can traverse the elements in the collection forward or backward.

2. The support for element operations is different: Iterator can only traverse the elements in the collection, and cannot perform operations such as adding and modifying, while ListIterator can add, modify, and delete elements in the collection during the traversal process.

3. Different support methods: ListIterator has more methods than Iterator, such as previous(), hasPrevious(), add(), set(), etc., which are used to traverse forward, add elements, modify elements, etc. during the traversal process .

4. The types of collections supported are different: Iterator can traverse any type of collection, including List, Set, Map, etc., while ListIterator can only traverse collections of List type.

Sao Dai understands: If you need to traverse the List collection and support element operations, you should use ListIterator; if you only need to traverse the elements in the collection, you can use Iterator.

What are the different ways to iterate over a List? What is the implementation principle of each method?

There are several different ways to iterate over a List collection:

1. for loop traversal

Use the for loop to traverse the List collection, and you can use the subscript to access the elements in the collection. The realization principle is to traverse the List collection through the loop control variable, and access the elements in the collection through the subscript.


List<String> list = new ArrayList<>();
list.add("A");
list.add("B");
list.add("C");
for (int i = 0; i < list.size(); i++) {
    String element = list.get(i);
    System.out.println(element);
}

Explain the principle

The implementation principle of the for loop traversing the List collection is to traverse the List collection through the loop control variable, and access the elements in the collection through subscripts.

The syntax of the for loop is as follows:


for (初始化表达式; 布尔表达式; 更新表达式) {
    // 循环体
}

Wherein, the initialization expression is used to initialize the loop control variable, the Boolean expression is used to judge whether the loop continues to execute, and the update expression is used to update the value of the loop control variable.

When the for loop traverses the List collection, the initialization expression usually initializes the loop control variable to 0, the Boolean expression usually judges whether the loop control variable is smaller than the length of the List collection, and the update expression usually adds 1 to the loop control variable. In the loop body, the elements in the List collection can be accessed through subscripts to realize the function of traversing the List collection.

2. foreach loop traversal

Use foreach loop to traverse the List collection, you can directly access the elements in the collection. The implementation principle of foreach loop traversal is to traverse each element in the List collection, automatically obtain the value of the element and assign it to the loop variable, and then execute the statement in the loop body.

List<String> list = new ArrayList<>();
list.add("A");
list.add("B");
list.add("C");
for (String element : list) {
    System.out.println(element);
}

Explain the principle

Specifically, the foreach loop first obtains the iterator of the List collection, and then uses the iterator to traverse each element in the collection. During the traversal process, the loop variable will automatically get the value of the element and assign it to the loop variable. Then, execute the statements in the loop body to complete the traversal operation on the List collection.

Compared with the for loop traversing the List collection, the syntax of the foreach loop traversing the List collection is more concise. It does not need to explicitly use the subscript to access the elements in the collection, and can directly access the element variable. Therefore, the foreach loop is easier to understand and use, and it is one of the common ways to traverse the List collection.

3. Iterator traversal

Use the iterator to traverse the List collection, you can traverse any type of collection, including List, Set, Map, etc. The realization principle is to traverse the elements in the collection by obtaining the iterator of the List collection and using the hasNext() and next() methods.

List<String> list = new ArrayList<>();
list.add("A");
list.add("B");
list.add("C");
Iterator<String> iterator = list.iterator();
while (iterator.hasNext()) {
    String element = iterator.next();
    System.out.println(element);
}

4. ListIterator traversal

Use ListIterator to traverse the List collection, you can traverse the elements in the collection forward or backward, and support element operations. The implementation principle is to traverse the elements in the collection by obtaining the ListIterator of the List collection and using methods such as hasNext(), next(), hasPrevious(), previous(), add(), and set().

List<String> list = new ArrayList<>();
list.add("A");
list.add("B");
list.add("C");
ListIterator<String> iterator = list.listIterator();
while (iterator.hasNext()) {
    String element = iterator.next();
    System.out.println(element);
}

What are the usage scenarios of various collection traversal methods?

Various traversal methods of collections are applicable to various scenarios, as follows:

1. For loop traversal: It is suitable for occasions where the elements in the collection need to be accessed according to the subscript of the element (such as the List collection), for example, when the elements in the collection need to be modified or deleted.

2. Foreach loop traversal: It is suitable for occasions that only need to access the elements in the collection, and do not need to access according to the subscript of the element, for example, when only the elements in the collection need to be read.

3. Iterator traversal: It is suitable for occasions where the elements in the collection need to be modified or deleted, because when the iterator is used to traverse the collection, the elements in the collection can be deleted through the remove() method of the iterator.

4. Lambda expression traversal: It is suitable for occasions that require functional processing of the elements in the collection, such as filtering, mapping, sorting, and other operations on the elements in the collection.

Performance analysis of various traversal methods of the collection?

When traversing a collection, different traversal methods have different performances, and the specific performance ranking depends on various factors such as the type, size, and element type of the collection. The following is the performance ranking of various traversal methods in general:

1. For loop traversal: The performance is the best, because using subscripts to access elements in the collection is faster.

2. Iterator traversal: performance is second, because the iterator can directly access the elements in the collection, avoiding the overhead of calculating subscripts every time.

3. Foreach loop traversal: The performance is poor, because when using foreach loop to traverse the collection, you need to obtain the iterator first, and then use the iterator to traverse the elements in the collection. Compared with directly using subscripts to access the elements in the collection, the speed is slower .

4. Lambda expression traversal: The performance is poor, because Lambda expressions need to be compiled into function objects, and functional processing is required when traversing the collection, which is slower than directly accessing the elements in the collection.

5. Stream API traversal: The performance is the worst, because the Stream API requires multiple operations, such as filtering, mapping, sorting, etc. Each operation needs to create a new object, which is slower than directly accessing the elements in the collection .

It should be noted that the above performance rankings are for reference only, and the actual performance may be affected by various factors, such as the size of the collection, element type, number of traversals, etc. Therefore, in actual development, it is necessary to select the optimal traversal method according to the specific situation. At the same time, performance analysis tools can also be used to evaluate the performance of different traversal methods.

ArrayList

Talk about the advantages and disadvantages of ArrayList

ArrayList is a commonly used dynamic array implementation in Java, which has the following advantages and disadvantages:

advantage:

1. Fast random access speed: The bottom layer of ArrayList is implemented by an array, and the elements in the array can be directly accessed through subscripts, so the random access speed is fast.

2. Faster traversal speed: ArrayList supports fast traversal, because its bottom layer is an array, which can use CPU cache to improve traversal speed.

3. Can store basic data types: ArrayList can store basic data types and object types, and can be automatically boxed and unboxed.

shortcoming:

1. The performance of insertion and deletion operations is poor: Since the bottom layer of ArrayList is implemented by an array, when inserting and deleting elements, other elements need to be moved, resulting in poor performance.

2. Does not support concurrent access by multiple threads: ArrayList is not thread-safe. If multiple threads modify ArrayList at the same time, it may cause data inconsistency or throw ConcurrentModificationException.

3. Waste of memory space: ArrayList needs to specify the initial capacity when it is created. If the capacity is insufficient, it needs to be expanded, and the memory space needs to be reallocated during expansion, which may lead to waste of memory space.

Sao Dai understands: ArrayList is suitable for scenarios with more random access and modification operations, but LinkedList may be more suitable for scenarios with more insertion and deletion operations. At the same time, in the scenario of multi-thread concurrent access, you can use a thread-safe List implementation such as Vector or CopyOnWriteArrayList.

How to realize the conversion between array and List?

Array to List: Use Arrays.asList(array) to convert.

List to array: Use the toArray() method that comes with List.

// List 转数组
List list = new ArrayList();
list.add("123");
list.add("456");
list.toArray();
// 数组转 List
listString[] array = new String[]{"123","456"};
Arrays.asList(array);

What is the difference between ArrayList and LinkedList?

Both ArrayList and LinkedList are commonly used List implementations in Java, and their differences are mainly reflected in the following aspects:

1. The underlying implementation methods are different: the underlying ArrayList is implemented using an array, while the underlying LinkedList is implemented using a doubly linked list.

2. Random access and traversal performance are different: ArrayList supports fast random access and traversal because the bottom layer is implemented by an array, and can use CPU cache to improve access speed; LinkedList traversal speed is slower because it needs to traverse each node from the head node , and cannot take advantage of the CPU cache to improve access speed.

3. The performance of insertion and deletion operations is different: because the bottom layer of ArrayList is implemented by an array, other elements need to be moved when inserting and deleting elements, resulting in poor performance; while the performance of insertion and deletion operations of LinkedList is better, because only the relevant elements need to be modified. pointers to neighbor nodes.

4. Different memory space occupation: LinkedList occupies more memory than ArrayList, because LinkedList nodes store two references in addition to storing data, one pointing to the previous element and one pointing to the next element. The ArrayList is implemented using an array, and each element only needs to store the data itself, so the memory space occupied is relatively small. Each node of LinkedList needs to store two additional pointers, so when storing a large number of elements, it may cause a large memory footprint. The ArrayList needs to specify the initial capacity when it is created. If the capacity is insufficient, it needs to be expanded, and the memory space needs to be reallocated during the expansion, which may lead to waste of memory space.

Sao Dai understands: ArrayList is suitable for scenarios with more random access and modification operations, but LinkedList may be more suitable for scenarios with more insertion and deletion operations.

Doubly linked list

A doubly linked list is also called a doubly linked list, which is a kind of linked list. Each data node in it has two pointers, which point to the direct successor and direct predecessor respectively. Therefore, starting from any node in the doubly linked list, you can easily access its predecessor node and successor node.

What is the difference between ArrayList and Vector?

Both classes implement the List interface (the List interface inherits the Collection interface), and they are both ordered collections

1. Thread safety: Vector uses Synchronized to achieve thread synchronization, which is thread safe, while ArrayList is not thread safe.

2. Performance: ArrayList is better than Vector in terms of performance. All methods of the Vector class are synchronous. A Vector object can be safely accessed by two threads, but if one thread accesses the Vector, the code will spend a lot of time on synchronization operations.

3. Expansion: When the ArrayList needs to be expanded, it will create a new array and copy the elements in the original array to the new array. The size of the new array is 1.5 times that of the original array. When Vector expands, a new array will be created, and the elements in the original array will be copied to the new array, and the size of the new array will be twice the size of the original array. Therefore, the expansion of Vector consumes more memory than ArrayList, but relatively speaking, the performance of Vector is more stable because its expansion times are relatively small.

Sao Dai understands: Vector is thread-safe, but ArrayList is not thread-safe. If you need to use ArrayList in a multi-threaded environment, you need to perform synchronization.

When inserting data, which one is faster, ArrayList, LinkedList, or Vector?

When inserting data, LinkedList is relatively fast because its bottom layer is a linked list structure. Inserting an element only needs to modify the pointer of the adjacent node, and the time complexity is O(1). The bottom layer of ArrayList and Vector is an array structure, inserting an element needs to move the following element backward one bit, and the time complexity is O(n), where n is the length of the array. Therefore, LinkedList performs better than ArrayList and Vector when inserting data. Because the methods in Vector are modified with synchronized, Vector is a thread-safe container, and its performance is worse than that of ArrayList.

Insertion speed is fast to slow sorting: LinkedList>ArrayList>Vector

How to use ArrayList in a multi-threaded scenario?

ArrayList is not thread-safe. If you encounter a multi-thread scenario, you can use the synchronizedList method of Collections to convert it into a thread-safe one before using it.

    List<String> list = new ArrayList<>();
    List<String> synchronizedList = Collections.synchronizedList(list);
    synchronizedList.add("aaa");
    synchronizedList.add("bbb");
    for(int i =0; i < synchronizedList.size(); i++){	
        System.out.println(synchronizedList.get(i));
    }

Why is elementData of ArrayList decorated with transient?

Why is elementData of ArrayList decorated with transient?

In ArrayList, the elementData is modified into transient to save space. The automatic expansion mechanism of ArrayList, the elementData array is equivalent to the container. When the container is insufficient, the capacity will be expanded, but the capacity of the container is often greater than or equal to the number of elements stored in the ArrayList. . So the designer of ArrayList designed elementData as transient so that this array would not be serialized, and then manually serialized it in the writeObject method, and only serialized those elements that were actually stored, not the entire elementData array.

For example, now that there are actually 8 elements, the capacity of the elementData array may be 8x1.5=12. If the elementData array is directly serialized, the space of 4 elements will be wasted, especially when the number of elements is very large. This kind of waste is very uneconomical.

Definition of serialization and deserialization

Java serialization refers to the process of converting Java objects into byte sequences

Java deserialization refers to the process of restoring a sequence of bytes to a Java object.

transient keyword

Member variables modified by transient will not be serialized.

code analysis

Arrays in ArrayList are defined as follows:

private transient Object[] elementData;

Look at the definition of ArrayList again:

public class ArrayList<E>extends AbstractList<E>implementsList<E>, RandomAccess, Cloneable, java.io.Serializable

You can see that ArrayList implements the Serializable interface, which means that ArrayList supports serialization.

The role of transient is to hope that the elementData array will not be serialized

Override the writeObject implementation:

private void writeObject(java.io.ObjectOutputStream s)throws java.io.IOException{

    int expectedModCount = modCount;

    s.defaultWriteObject();

    s.writeInt(elementData.length);

    for(int i=0; i<size; i++)

        s.writeObject(elementData[i]);

        if(modCount != expectedModCount){

        thrownewConcurrentModificationException();

        }

   }

Sao Dai's understanding: From the source code, it can be observed that i<size is used instead of i<elementData.length when looping, indicating that only the elements actually stored are needed for serialization, not the entire array.

What is the difference between Array (array) and ArrayList?

  • Array can store basic data types and reference types, and ArrayList can only store reference types.
  • Array is specified with a fixed size, while the size of ArrayList is automatically expanded.
  • There are not as many built-in methods in Array as in ArrayList, such as addAll, removeAll, iteration and other methods only in ArrayList.
  • The elements stored in the Array array must be of the same data type; the objects stored in the ArrayList can be of different data types.

What is CopyOnWriteArrayList and what application scenarios can it be used for? What are the pros and cons?

What is CopyOnWrite?

A CopyOnWrite container is a copy-on-write container. The popular understanding is that when we add elements to a container, we do not directly add to the current container, but first copy the current container to create a new container, and then add elements to the new container. After adding the elements, Then point the reference of the original container to the new container. The advantage of this is that we can perform concurrent reads on the CopyOnWrite container without locking, because the current container will not add any elements. So the CopyOnWrite container is also an idea of ​​read-write separation, reading and writing different containers.

What is CopyOnWriteArrayList?

CopyOnWriteArrayList is a thread-safe collection. Its implementation is to create a new array during the write operation, copy the elements in the original array to the new array, then perform the write operation in the new array, and finally replace the new array with original array.

The read operation is performed directly in the original array, so the read operation does not need to be synchronized, and the read and write separation can be realized. Since a new array is created for each write operation, CopyOnWriteArrayList is relatively slow for write operations, but very performant for read operations.

It should be noted that the iterator of CopyOnWriteArrayList is weakly consistent, that is, during the iteration process, if other threads modify the List, the iterator will not throw ConcurrentModificationException, but there is no guarantee that the iterator can traverse to all elements. The usage scenario of CopyOnWriteArrayList is suitable for scenarios with more reads and fewer writes.

Sao Dai's understanding: Why can't it be guaranteed that the iterator can traverse all the elements? For example, I have a collection with ten elements in it, and I have traversed five of them. At this time, I added a new element between the first five elements, but since I have already traversed to the fifth one, I will only go backwards Continue traversing, and the changes before traversing cannot be traversed!

What are the advantages of CopyOnWriteArrayList?

The advantages of CopyOnWriteArrayList mainly include the following points:

1. Thread safety: CopyOnWriteArrayList is a thread-safe implementation of List, which can be used in a multi-threaded environment without additional synchronization processing.

2. Separation of read and write: The read operation and write operation of CopyOnWriteArrayList are separated, and the read operation is directly performed in the original array without synchronization processing, so the performance of the read operation is very high.

3. Weakly consistent iterator: The iterator of CopyOnWriteArrayList is weakly consistent and can be modified during iteration without throwing ConcurrentModificationException.

4. Applicable to scenarios with more reads and fewer writes: Since each write operation needs to create a new array, it is suitable for scenarios with more reads and fewer writes. If the write operations are frequent, performance may decrease.

It should be noted that the write operation of CopyOnWriteArrayList is relatively slow, because each write operation needs to create a new array, so it is suitable for scenarios with more reads and fewer writes. If the frequency of read and write operations is equal, or if the write operation is more frequent, it may cause performance degradation. In addition, because each write operation will create a new array, it will take up more memory space, which needs to be selected according to specific scenarios and requirements.

What are the disadvantages of CopyOnWriteArrayList?

The disadvantages of CopyOnWriteArrayList mainly include the following points:

1. High memory usage: Because each write operation will create a new array, it will take up more memory space.

2. The performance of write operations is low: Since a new array needs to be created for each write operation, the performance of write operations is low, and it is suitable for scenarios with more reads and fewer writes.

3. Data consistency problem: Since the iterator of CopyOnWriteArrayList is weakly consistent, that is to say, during the iteration process, if other threads modify the List, the iterator will not throw ConcurrentModificationException, but iteration is not guaranteed can iterate over all elements.

4. Not suitable for scenarios with high real-time requirements: Since a new array needs to be created for each write operation, it is not suitable for scenarios with high real-time requirements, which may cause delays in write operations

How does CopyOnWriteArrayList ensure thread safety when writing?

CopyOnWriteArrayList uses the "Copy-On-Write" strategy to ensure thread safety when writing. When a thread needs to perform a write operation, CopyOnWriteArrayList will first create a new array, and then copy the elements in the original array to the new array. Since only the current thread can access the new array when it is created, no synchronization is required. When writing in the new array, other threads can still access the original array and will not be affected by the current thread's writing. When the write operation is completed, CopyOnWriteArrayList replaces the new array with the original array, thus ensuring the thread safety of the write operation.

Set

The difference between List and Set

Both List and Set are interfaces in the Java collection framework, and their main differences lie in the following aspects:

1. The order of the elements: the elements in the List are arranged in the order of insertion, and elements can be accessed according to the index; while the elements in the Set are unordered, and the elements cannot be accessed according to the index.

2. The uniqueness of elements: elements in the List can be repeated, and multiple null elements can be inserted. The elements in the Set cannot be repeated, each element can only appear once, and only one null element is allowed to be stored, and the uniqueness of the element must be guaranteed

3. Implementation method: Common implementation methods of List include ArrayList, LinkedList, etc.; and common implementation methods of Set include HashSet, TreeSet, LinkedHashSet, etc.

4. Application scenarios: List is suitable for scenarios that need to access elements in the order of insertion, such as maintaining an ordered list; and Set is suitable for scenarios that need to ensure the uniqueness of elements, such as deduplication, search, etc.

Sao Dai's understanding: If you need to access elements in the order of insertion, you can choose List; if you need to ensure the uniqueness of elements, you can choose Set.

Tell me about the implementation principle of HashSet?

The bottom layer of HashSet is actually HashMap. The default constructor is to construct a HashMap with an initial capacity of 16 and a load factor of 0.75. A HashMap object is encapsulated to store all the collection elements. All the collection elements put into the HashSet are actually saved by the key of the HashMap, and the value of the HashMap stores a PRESENT, which is a static Object object.

How does HashSet check for duplicates? How does HashSet ensure that data is not repeatable?

When adding an object to the HashSet (the bottom layer is to store this object in the key of the HashMap), the hashCode method will be called to get a hash value, and then check whether there is a hash conflict. If there is no hash conflict, it is directly inserted , if there is a hash conflict, then call the equals method to further judge, if the return values ​​of the two hashCode() are equal, and the equals comparison also returns true (indicating that the added object is a duplicate), then give up adding this duplicate element, which also satisfies the feature that the elements in the Set are not repeated. If a hash conflict occurs but the return result of the equals method is false, then hash the newly added object to another location. In this way, we greatly reduce the number of equals, which greatly improves the execution speed. If you don’t use the hashcode function, you need to traverse all the elements in the set and call the equals method one by one to compare whether they are the same. Obviously, this requires a lot of calls to the equals method

Sao Dai’s understanding: the execution results of the hashCode method of different objects may be the same, which is the so-called hash conflict, but the result returned after calling equals on different objects must be false, only the return values ​​of hashCode() of two identical objects are equal, Comparing by equals also returns true

Relevant provisions of hashCode() and equals()

hashCode() and equals() are two methods defined in Object class, which are widely used in Java to judge whether objects are equal. When using these two methods, you need to pay attention to the following regulations:

1. If two objects are equal (the equals() method returns true), then their hashCode() values ​​must be equal. This is because the implementation of the hashCode() method usually relies on the content of the objects, and if two objects are equal, then their content should also be equal, and therefore their hashCode() values ​​should also be equal.

2. If the hashCode() values ​​of two objects are equal, they are not necessarily equal (the equals() method returns true). This is because the hashCode() method may have hash collisions, that is, different objects may produce the same hashCode() value. Therefore, when the hashCode() values ​​are equal, you also need to call the equals() method for comparison to determine whether they are equal.

3. If a class rewrites the equals() method, it should also rewrite the hashCode() method to ensure the correctness of the above provisions. This is because the hashCode() method in the Object class is calculated based on the address of the object. If the hashCode() method is not rewritten, it may cause equal objects to have different hashCode() values, thus violating the first rule .

4. If a class rewrites the hashCode() method, it should also rewrite the equals() method to ensure the correctness of the above provisions. This is because the hashCode() method may have a hash collision, and if the equals() method is not rewritten, it may cause unequal objects to have the same hashCode() value, thus violating the second rule.

It should be noted that the implementation of hashCode() and equals() should ensure efficiency and correctness, and should not be too complicated or time-consuming. In addition, the return values ​​of the hashCode() method should be as evenly distributed as possible to improve the performance of the hash table.

The difference between HashSet and HashMap

Both HashSet and HashMap are implementation classes in the Java collection framework, and their main differences lie in the following aspects:

1. Storage method: HashSet stores a set of unique and unordered elements, while HashMap stores a set of key-value pairs.

2. Bottom layer implementation: The bottom layer of HashSet is implemented based on HashMap. It uses HashMap to store elements, but uses the value of the element as a key, and the value corresponding to the key is a fixed Object object; while HashMap is implemented using a hash table.

3. The uniqueness of elements: the elements in HashSet are unique, and each element can only appear once; while the keys in HashMap are unique, but the values ​​can be repeated.

4. Access method: elements in HashSet cannot be accessed by index, but can only be accessed by iterator; while elements in HashMap can be accessed by key.

You need to choose the appropriate collection type according to your specific needs. If you need to store a set of unique, unordered elements, you can choose HashSet; if you need to store a set of key-value pairs, you can choose HashMap. If you need to store key-value pairs and ensure the uniqueness of the key at the same time, you can choose LinkedHashMap.

How do TreeMap and TreeSet compare elements when sorting? How does the sort() method in the Collections tool class compare elements?

How do TreeMap and TreeSet compare elements when sorting?

TreeSet requires that the class of the stored object must implement the Comparable interface, which provides the compareTo() method for comparing elements. When an element is inserted, this method will be called back to compare the size of the element.

TreeMap requires that the stored key-value pair mapping keys must implement the Comparable interface so that elements can be sorted according to the keys.

Sao Dai's understanding: elements are compared by implementing the Comparable interface, which provides compareTo() for comparing elements

method, which is called when inserting an element to compare the size of the element

How does the sort() method in the Collections tool class compare elements?

The sort method of the Collections tool class has two overloaded forms,

  • It is required that the objects stored in the container to be sorted must implement the Comparable interface, which provides the compareTo() method for comparing elements. When inserting elements, this method will be called to compare the size of the elements
  • The elements in the collection can be passed in objects that do not implement the Comparable interface, but the second parameter is required, which is a subtype of the Comparator interface (you need to override the compare method to achieve element comparison), that is, you need to define a comparator , and then the sort() method comparison actually calls the compare method of this comparator for comparison

The second example

Queue

What is BlockingQueue?

What is BlockingQueue?

BlockingQueue is an interface in Java concurrent programming. It inherits from the Queue interface and represents a queue that supports blocking and is used to transfer data between multiple threads. When the queue is full, the queued threads will be blocked until the queue is full; when the queue is empty, the dequeued threads will be blocked until there are elements in the queue.

BlockingQueue is a thread-safe queue, which is used in a multi-threaded environment and has thread-safe characteristics.

The implementation of BlockingQueue usually uses synchronization mechanisms, such as locks, condition variables, etc., to ensure safe access between multiple threads. In a multi-threaded environment, multiple threads can read and write to BlockingQueue at the same time, without thread safety issues such as data competition.

For example, when using ArrayBlockingQueue, both put() and take() methods need to acquire the lock first, and then wait on the condition variable or wake up other threads to ensure thread safety. When using LinkedBlockingQueue, different threads can read and write to the queue at the same time, because it uses two locks, one for enqueue operation and one for dequeue operation, to avoid competition and deadlock problems.

It should be noted that although BlockingQueue is a thread-safe queue, it still needs to pay attention to some details in actual use, such as queue capacity size, thread pool size, etc., to avoid performance problems or deadlock problems caused by improper use .

The BlockingQueue interface provides a variety of blocking methods, including put(), take(), offer(), poll(), etc., which can be used to implement producer-consumer models, thread pools and other concurrent scenarios.

The main features of the BlockingQueue interface are: when the queue is empty, the take() method will block and wait for the arrival of elements; when the queue is full, the put() method will block and wait for the queue to be free. This blocking waiting mechanism can avoid competition and deadlock problems between multiple threads, and improve the robustness and reliability of the program.

BlockingQueue mainly provides four types of methods, as shown in the following table:

method

Throw an exception

return a specific value

block

block for a specific time

enqueue

add(e)

offer(e)

put(e)

offer(e, time, unit)

dequeue

remove()

poll()

take()

poll(time, unit)

Get the first element of the queue

element()

peek()

not support

not support

In addition to throwing an exception and returning a specific value method is the same as the definition of the Queue interface, BlockingQueue also provides two types of blocking methods: one is to block when there is no space/element in the queue until there is space/element; the other is in Attempt to enqueue/exit at a specific time, and the waiting time can be customized.

main implementation class

Common implementation classes of the BlockingQueue interface include ArrayBlockingQueue, LinkedBlockingQueue, SynchronousQueue, etc. Among them, ArrayBlockingQueue and LinkedBlockingQueue are blocking queues implemented based on arrays and linked lists. SynchronousQueue is a special blocking queue that does not store elements, but only synchronizes between producers and consumers.

Implementation class

Function

ArrayBlockingQueue

An array-based blocking queue uses an array to store data and needs to specify its length, so it is a bounded queue

LinkedBlockingQueue

A blocking queue based on a linked list , using a linked list to store data, is an unbounded queue by default ; you can also set the maximum number of elements through the capacity in the construction method, so it can also be used as a bounded queue

SynchronousQueue

A queue without buffering, the data generated by the producer will be directly acquired by the consumer and consumed immediately

PriorityBlockingQueue

Priority-based blocking queue , the bottom layer is based on array implementation, it is an unbounded queue

DelayQueue

Delay queue , the elements in it can only be dequeued from the queue after the specified delay time

Among them, ArrayBlockingQueue and LinkedBlockingQueue are more commonly used in daily development

What is the difference between poll() and remove() in Queue?

In the Queue interface, both poll() and remove() are methods used to obtain and remove the element at the head of the queue, but their behavior is slightly different.

1. poll() method: get and remove an element from the head of the queue, and return null if the queue is empty.

2. remove() method: get and remove an element from the head of the queue, and throw NoSuchElementException if the queue is empty.

In simple terms, the poll() method returns null when the queue is empty, and the remove() method throws an exception when the queue is empty.

Therefore, when using the Queue interface, if you are not sure whether the queue is empty, you can use the poll() method to obtain and remove the head element of the queue. If the queue is empty, return null; if you are sure that the queue is not empty, you can use remove() method to obtain and remove the head element of the queue, and throw an exception if the queue is empty.

It should be noted that when using the remove() method, if the queue is empty, a NoSuchElementException will be thrown, so exception handling or try-catch statements are required to avoid program crashes.

HashMap

HashMap finds the position of the bucket

jdk1.7

In JDK 1.7, the calculation method of HashMap's bucket position is different from JDK 1.8. Specifically, the calculation method of the bucket position of HashMap in JDK 1.7 is as follows:

1. First calculate the hash value `h` of the key, using `key.hashCode()` method.

2. The bucket position `i` is then calculated by the following formula: i = (n - 1) & h

   Among them, `n` represents the length of the `table` array. Since in JDK 1.7, the length of `table` is not necessarily a power of 2, it is necessary to use `n - 1` to ensure that the value of `i` is within the subscript range of `table`.

3. If there is already an element at the bucket position `i`, then the `equals()` method of the key will be used to compare whether the key is equal. If the keys are equal, the value corresponding to the key is replaced with the new value; if the keys are not equal, the key-value pair is inserted at the end of the linked list.

It should be noted that in JDK 1.7, the expansion mechanism of HashMap is different from that of JDK 1.8. Specifically, when the number of elements in the HashMap exceeds the product of the load factor and the capacity, the HashMap will double the capacity and redistribute all elements into new buckets. This approach may cause multiple keys to be mapped to the same bucket during expansion, thereby reducing the performance of HashMap.

jdk1.8

HashMap is divided into three processes to find the position of the bucket

  • Find the hashcode of the key
  • Call the hash function to get the hash value, and perform XOR operation on the upper 16 bits and lower 16 bits of the hashcode.
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);// >>> 是无符号右移符号位
}
  • Use (length- 1) & hash to AND the hash value and length-1 to find the position of the bucket (the hash value is the value obtained by the above hash function, and length is the length of the table array)
 if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);

Sao Dai understands: It is not to get the hash value to know where the element is stored in the array, but to get the position of the array after (length- 1) & hash, which is to reduce hash conflicts

Tell me about the implementation principle of HashMap?

JDK1.7

The backbone of HashMap is an Entry array. Entry is the basic unit of HashMap, and each Entry contains a key-value key-value pair. (In fact, the so-called Map is actually a collection that saves the mapping relationship between two objects), where Key and Value are allowed to be null. These key-value pairs (Entry) are scattered and stored in an array, and the initial value of each element of the HashMap array is Null.

Sao Dai understands: HashMap cannot guarantee the order of mapping, and the order of data after insertion cannot be guaranteed to remain unchanged (for example, the expansion operation before 1.8 will cause the order to change)

Entry is a static inner class in HashMap.

  static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;//存储指向下一个Entry的引用,单链表结构
        int hash;//对key的hashcode值进行hash运算后得到的值,存储在Entry,避免重复计算

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        } 

The overall structure of HashMap is as follows

Sao Dai’s understanding: In simple terms, HashMap is composed of array + linked list. The array is the main body of HashMap, and the linked list mainly exists to solve hash conflicts. null), then search, add and other operations are very fast, and only one addressing is required; if the located array contains a linked list, for the add operation, its time complexity is O(n), first traverse the linked list, if it exists, it will be overwritten, Otherwise, it is added; for the search operation, it is still necessary to traverse the linked list, and then use the equals method of the key object to compare and search one by one. Therefore, considering performance, the fewer linked lists in HashMap, the better the performance.

JDK1.8

The underlying principle of HashMap in JDK 1.8 is implemented using the structure of array + linked list + red-black tree, mainly including the following aspects:

1. Array: HashMap internally maintains an array for storing key-value pairs. The length of the array is fixed and must be a power of 2.

2. Linked list: Each element in the array is the head node of a linked list, which is used to store key-value pairs with the same hash value. When the length of the linked list exceeds 8 (TREEIFY_THRESHOLD - threshold), the linked list will automatically convert to a red-black tree to improve search efficiency.

3. Red-black tree: When the length of the linked list exceeds a certain threshold, HashMap will convert the linked list into a red-black tree to improve search efficiency. The red-black tree is a self-balancing binary search tree, and the time complexity of its search, insertion, deletion and other operations is O(log n).

4. Expansion: When the number of elements in the HashMap exceeds 75% of the length of the array, the HashMap will perform an expansion operation, that is, the length of the array will be doubled, and the position of each element in the new array will be recalculated. The expansion operation will cause the position of all elements to change, so the position of each element in the new array needs to be recalculated, and this process is time-consuming.

5. Hash function: HashMap will use the hashCode() method of the key object to calculate the hash value of the key, and then calculate the position of the key-value pair in the array based on the hash value.

6. Thread safety: HashMap in JDK 1.8 is not thread safe. If multiple threads write to HashMap at the same time, data loss or infinite loop may occur. If you need to use HashMap in a multi-threaded environment, you can use ConcurrentHashMap.

What is the bucket of hashmap?

Buckets in HashMap refer to array elements that store key-value pairs, also known as hash buckets or hash table nodes.

Inside HashMap, key-value pairs are stored in an array, which is called a hash table. Each array element is a bucket, which can store one or more key-value pairs. When a key-value pair in HashMap is added to the hash table, it will calculate which bucket the key-value pair should be stored in according to the hash value of the key. If there are already one or more key-value pairs in the bucket, the new key-value pair will be added to the bucket and become an element in the bucket.

In order to solve the hash collision problem, each bucket of HashMap can store not only one key-value pair, but also multiple key-value pairs. When multiple key-value pairs have the same hash value, they will be added to the same bucket and become a linked list or red-black tree in this bucket. HashMap will determine which data structure to use to store key-value pairs based on the length of the linked list or the number of nodes in the red-black tree, thereby improving search efficiency.

Therefore, the bucket is the basic unit for storing key-value pairs in HashMap, and it is the core to realize the internal data structure of HashMap.

Sao Dai’s understanding: Simply put, the so-called bucket refers to the elements of the entry array, and each entry is a bucket

What are the differences between the underlying implementation of HashMap in JDK1.7 and JDK1.8?

HashMap is a very commonly used data structure in Java. It can be used to store key-value pairs, and can quickly find the corresponding value according to the key. In JDK 1.7 and JDK 1.8, the underlying implementation of HashMap has the following differences:

1. The underlying arrays are initialized differently. In JDK 1.7, the underlying array was initialized when an element was first inserted, whereas in JDK 1.8, the underlying array was initialized when the HashMap object was created.

2. The expansion mechanism is different. In JDK 1.7, when the number of elements exceeds 75% of the capacity, HashMap will perform a capacity expansion operation, that is, the capacity will be doubled, and the position of all elements will be recalculated. This expansion strategy may lead to a large number of elements in the same bucket, resulting in an overly long linked list and reduced query efficiency. In JDK 1.8, when the length of the linked list in a bucket is greater than or equal to 8, it will be judged whether the capacity of the current HashMap is greater than or equal to 64. If it is less than 64, it will double the capacity; if it is greater than or equal to 64, it will convert the linked list into a red-black tree to improve query efficiency.

3. The calculation method of the hash function is different. In JDK 1.7, the hash function of HashMap directly uses the hashCode() method of the key object to calculate the hash value of the key, and then calculates the position of the key-value pair in the array based on the hash value. This calculation method may lead to hash conflicts, that is, different key objects calculate the same hash value, resulting in elements being stored in the same bucket, the linked list is too long, and query efficiency is reduced. In JDK 1.8, the hash function of HashMap is optimized. It first calls the hashCode() method of the key object to calculate the hash value of the key, and then XORs the upper 16 bits and lower 16 bits of the hash value to obtain an int type hash value. This calculation method can avoid conflicts caused by only relying on low-order data to calculate the hash. The calculation result is determined by the combination of high and low bits, making the element distribution more even.

4. In JDK 1.8, when a hash conflict occurs, the tail insertion method is used, that is, a new node is inserted at the end of the linked list. In this way, the order of nodes in the linked list can be maintained, so that the traversal order of the linked list is consistent with the insertion order, thereby improving query efficiency. When expanding, 1.8 will maintain the order of the original linked list, that is, rearrange the nodes in the linked list according to the order of insertion, instead of reversing the order of the linked list. In JDK 1.7, when a hash collision occurs, the head insertion method is used, that is, a new node is inserted into the head of the linked list. This can ensure that when traversing the linked list, the most recently inserted node is traversed first, thereby improving query efficiency. When expanding, 1.7 will reverse the order of the linked list, that is, rearrange the nodes in the linked list in reverse order.

5. In JDK 1.8, HashMap will detect whether it needs to be expanded after the element is inserted, that is, insert the element first, and then check whether it needs to be expanded. In JDK 1.7, HashMap will detect whether expansion is required before inserting elements, that is, first detect whether expansion is required, and then insert elements. The purpose of this is to avoid frequent expansion operations when inserting elements, thereby improving the efficiency of inserting elements.

The specific process of the put method of HashMap?

putVal method execution flow chart

①. Determine whether the array table is empty or length=0, if yes, execute resize() to expand;

②. Calculate the hash value according to the key value key to get the inserted array index i, if table[i]==null, directly create a new node to add, turn to ⑥, if table[i] is not empty, turn to ③;

③. Determine whether the first element of table[i] is the same as the key, if it is the same, directly overwrite the value, otherwise turn to ④, the same here refers to hashCode and equals;

④. Determine whether table[i] is a treeNode, that is, whether table[i] is a red-black tree, if it is a red-black tree, directly insert key-value pairs into the tree, otherwise turn to ⑤;

⑤. Traversing table[i], judging whether the length of the linked list is greater than 8, if it is greater than 8, convert the linked list into a red-black tree, and perform an insertion operation in the red-black tree, otherwise perform an insertion operation of the linked list; if the key is found to already exist during the traversal process Just overwrite the value directly;

⑥. After the insertion is successful, judge whether the actual number of key-value pairs size exceeds the maximum capacity threshold. If it exceeds, expand the capacity.

How is the expansion operation of HashMap realized?

JDK1.7 expansion mechanism

When the number of elements in the HashMap exceeds the size of the array (the total length of the array, not the size of the number in the array)*loadFactor (that is, HashMap.Size > Capacity * LoadFactor, LoadFactor is the load factor), the array will be expanded, and the default value of loadFactor The value is 0.75, which is a compromise value. That is to say, by default, the size of the array is 16, so when the number of elements in the HashMap exceeds 16*0.75=12 (this value is the threshold value in the code, also called the critical value), expand the size of the array It is 2*16=32, that is, doubled, and then recalculates the position of each element in the array. The default initial capacity of HashMap is 16, and the load factor is 0.75; when we customize an initial capacity through the parameterized construction of HashMap, the given value must be a power of 2 value;

According to the loading factor rule, at that time, hashMap will perform an expansion operation; the expansion process can be divided into two steps:

  • resize : Create a new Entry empty array with twice the length of the original array
  • transfer: elements in the old array are migrated to the new array

Expansion in HashMap is to call the resize() method

void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    //如果当前的数组长度已经达到最大值,则不在进行调整
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }
    //根据传入参数的长度定义新的数组
    Entry[] newTable = new Entry[newCapacity];
    //按照新的规则,将旧数组中的元素转移到新数组中
    transfer(newTable);
    table = newTable;
    //更新临界值
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}

As you can see from the code, if the original table length has reached the upper limit, it will no longer be expanded. If it is not reached, it will be expanded by twice the size of the original array

The function of the transfer method is to put the Node of the original table into the new table. jdk1.7 uses the head insertion method, that is to say, the order of the linked list in the new table is opposite to that in the old list, and the HashMap thread is not safe. In some cases, this head-inserting method may result in ring nodes.

//旧数组中元素往新数组中迁移
void transfer(Entry[] newTable) {
    //旧数组
    Entry[] src = table;
    //新数组长度
    int newCapacity = newTable.length;
    //遍历旧数组
    for (int j = 0; j < src.length; j++) {
        Entry<K,V> e = src[j];
        if (e != null) {
            src[j] = null;
            do {
                Entry<K,V> next = e.next;
                int i = indexFor(e.hash, newCapacity);//放在新数组中的index位置
                e.next = newTable[i];//实现链表结构,新加入的放在链头,之前的的数据放在链尾
                newTable[i] = e;
                e = next;
            } while (e != null);
        }
    }
}

The while loop describes the process of head interpolation

JDK1.8 expansion

HashMap triggers expansion conditions

  • The default load factor of hashMap is 0.75, that is, if the number of elements in the hashmap exceeds 75% of the total capacity, expansion will be triggered
  • In JDK 1.8, when the length of the linked list in a bucket is greater than or equal to 8, it will judge whether the capacity of the current HashMap is greater than or equal to 64. If it is less than 64, it will double the capacity; if it is greater than or equal to 64, it will convert the linked list into a red-black tree to improve query efficiency.

The HashMap of jdk1.8 finds the position of the bucket

HashMap is divided into three processes to find the position of the bucket

  • Find the hashcode of the key
  • Call the hash function to get the hash value, and perform XOR operation on the upper 16 bits and lower 16 bits of the hashcode.
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
  • Use (length- 1) & hash to AND the hash value and length-1 to find the position of the bucket (the hash value is the value obtained by the above hash function, and length is the length of the table array)
if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);

Note: It is not the location of the element stored in the array that is obtained by getting the hash value, but the location of the array can only be obtained after (length- 1) & hash, which is to reduce hash conflicts

Whether it is JDK7 or JDK8, the expansion of HashMap is twice the original capacity each time, that is, a new array newtable will be generated. The expansion of JDK1.8 and 1.7 is actually similar, just put all the elements in the original array into the new It's just that the method of finding the position of the bucket for the element is different.

In JDK7, the position of the bucket is recalculated according to the three steps written above, but the value of the third step is the length of the new array -1, that is, newCap-1.

if (e.next == null)
     newTab[e.hash & (newCap - 1)] = e;//插入新值

But in JDK8, it is not with newCap, but directly with oldCap, that is, the hash value is ANDed with the length of the old array (oldCap).

if ((e.hash & oldCap) == 0) {
newTab[j] = loHead;
}else{

newTab[j + oldCap] = hiHead;
}
  • If the result of ANDing with oldCap is 0, it means that the bucket position of the current element remains unchanged.
  • If the result is 1, then the position of the bucket is the original position + the original array length (oldCap)

Sao Dai’s understanding: It should be noted that although JDK 8 recalculates the position of elements in the new array, the way of ANDing with the length of the old array is different from the modulo operation in JDK 7, but their essence is the same It is all to ensure that the positions of the elements in the new array can be evenly distributed to avoid hash collisions.

Why is the load factor of HashMap 0.75?

The load factor is a proportional value, which represents the ratio of the number of elements in the HashMap to the capacity. The larger the load factor and the smaller the fill factor, the more space wasted can be reduced, but the probability of hash collisions will increase. The smaller the load factor and the larger the fill factor, the more likely it will reduce the probability of hash collisions, but it will increase space waste.

In JDK 1.8, the default load factor for HashMap is 0.75. This value is obtained through experiments and optimization. When the load factor is 0.75, the space utilization rate of HashMap is relatively high, and the probability of hash collision is relatively low, which can achieve better performance. If the load factor is set too small, it will cause HashMap to perform frequent capacity expansion operations and reduce performance; if the load factor is set too large, it will increase the probability of hash collisions and reduce performance.

It should be noted that in practical applications, the choice of load factor should be determined according to specific conditions. If there are few hash collisions in the application, the load factor can be appropriately increased to reduce space waste; if there are many hash collisions, the load factor can be appropriately reduced to improve performance.

How does HashMap resolve hash conflicts?

Briefly summarize what methods HashMap uses to effectively resolve hash conflicts

  • Use the chain address method (using a hash table) to link data with the same hash value;
  • Use 2 perturbation functions (hash functions) to reduce the probability of hash collisions and make the data distribution more even;
  • The introduction of red-black tree further reduces the time complexity of traversal, making traversal faster;

What is a hash?

Hash, generally translated as "hash", also directly transliterated as "hash", which is to transform an input of any length into a fixed-length output through a hash algorithm, and the output is the hash value (hash value) ; This conversion is a compressed mapping, that is, the space of the hash value is usually much smaller than the space of the input, and different inputs may be hashed into the same output, so it is impossible to uniquely determine the input value from the hash value. Simply put, it is a function to compress a message of any length into a fixed-length message digest.

All hash functions have the following basic property:

If the hash values ​​calculated according to the same hash function are different, the input values ​​must also be different. However, if the hash values ​​calculated by the same hash function are the same, the input values ​​are not necessarily the same.

What are hash collisions?

When two different input values ​​calculate the same hash value according to the same hash function, we call it a collision (hash collision).

HashMap data structure

In Java, there are two relatively simple data structures for storing data: arrays and linked lists.

The characteristics of the array are: easy to address, difficult to insert and delete;

The characteristics of the linked list are: difficult to address, but easy to insert and delete;

So we combine arrays and linked lists to take advantage of their respective advantages, and use a method called chain address method to resolve hash conflicts:

In this way, we can organize objects with the same hash value into a linked list and place them under the bucket corresponding to the hash value, but compared to the int type returned by hashCode, the initial capacity of our HashMap is DEFAULT_INITIAL_CAPACITY = 1 << 4 (ie 2 to the fourth power 16) is much smaller than the range of the int type, so if we simply use the remainder of the hashCode to obtain the corresponding bucket, this will greatly increase the probability of hash collisions, and in the worst case, the HashMap Become a singly linked list (that is, there are always conflicts, and then finally become a singly linked list), so we still need to optimize the hashCode

hash() function

The problem mentioned above is mainly because if you use the hashCode to take the remainder, then it is equivalent to only the low bits of the hashCode participating in the calculation, and the high bits have no effect, so our idea is to let the high bits of the hashCode value also participate in the calculation , to further reduce the probability of hash collision and make the data distribution more even. We call such an operation disturbance. The hash() function in JDK 1.8 is as follows:

static final int hashCode(Object key){
    int h;
    //与自己右移16位进行异或运算(高低位异或)
    return (key==null)? 0 : (h==key.hashCode()) ^ (h>>>16)//
}

This is more concise than in JDK 1.7, compared to 4 bit operations, 5 XOR operations (9 perturbations) in 1.7, in 1.8, only 1 bit operation and 1 XOR operation (2 perturbations);

JDK1.8 adds red-black tree

Through the above chain address method (using hash table) and disturbance function, we successfully make our data distribution more even and reduce hash collisions. However, when there is a large amount of data in our HashMap, adding the corresponding linked list under a certain bucket of ours has n elements, then the traversal time complexity is O(n). In order to solve this problem, JDK1.8 adds a red-black tree data structure in HashMap, which further reduces the traversal complexity to O(logn);

Briefly summarize what methods HashMap uses to effectively resolve hash conflicts

  • Use the chain address method (using a hash table) to link data with the same hash value;
  • Use 2 perturbation functions (hash functions) to reduce the probability of hash collisions and make the data distribution more even;
  • The introduction of red-black tree further reduces the time complexity of traversal, making traversal faster;

Usually four solutions to hash conflicts

  • Chain address method: Each unit of the hash table is used as the head node of the linked list, and all elements whose hash address is i form a synonym linked list. That is, when a conflict occurs, the keyword is linked at the end of the linked list with the unit as the head node.
  • Open addressing method: When a conflict occurs, look for the next empty hash address. As long as the hash table is large enough, an empty hash address can always be found.
  • Rehashing method: When a conflict occurs, the hash value is recalculated by other functions.
  • Establish a common overflow area: divide the hash table into a basic table and an overflow table, and when a conflict occurs, put the conflicting elements into the overflow table.

Can any class be used as the key of a Map?

In Java, any class can be used as the key of the Map, as long as the class correctly implements `hashCode()` and `equals()` methods. The `hashCode()` method is used to calculate the hash code of the object, and the `equals()` method is used to determine whether two objects are equal.

When using a custom class as the key of the Map, you need to ensure that the class correctly implements `hashCode()` and `equals()` methods. If these two methods are not implemented correctly, the elements in the HashMap may not be stored and retrieved correctly, and may even cause problems such as an infinite loop in the HashMap.

When implementing `hashCode()` and `equals()` methods, the following principles need to be followed:

- If two objects are equal, their hash codes must be equal.
- If two objects have equal hash codes, they are not necessarily equal.
- The `equals()` method must satisfy reflexivity, symmetry, transitivity, and consistency.

It should be noted that if a mutable object is used as the key of the Map, when the object changes, its hash code will also change. This may cause the object to not be correctly retrieved from the HashMap. Therefore, it is generally recommended to use immutable objects as the key of the Map.

Why are wrapper classes such as String and Integer in HashMap suitable as Key?

The characteristics of packaging classes such as String and Integer can ensure the immutability and calculation accuracy of Hash values, and can effectively reduce the probability of Hash collisions

1. They are all final types, that is, immutability, to ensure the unchangeability of the key, and there will be no case of obtaining different hash values

2. The equals(), hashCode() and other methods have been rewritten internally, complying with the internal specifications of HashMap, and it is not easy for the Hash value calculation error to occur;

What should I do if I use a custom object as the Key of HashMap?

The hashCode() and equals() methods must be rewritten at the same time. The purpose of rewriting these two methods at the same time is to ensure that when using a custom object as the key of the HashMap, the key is unique. Assuming that these two methods are not rewritten now, then save Enter two custom identical objects, first call the hashCode() method of Object, because the hashCode() method of Object compares two objects with different reference addresses, so the result of the hashCode of the two custom identical objects If it is different, it will be judged as two different objects, and they will be stored in the HashMap as the key, which does not conform to the uniqueness of the key. In fact, the reason is the same as the Set below

Why must the hashCode method be overridden when equals is overridden?

If only the equals method is rewritten, then by default, assuming that two custom objects with the same content are stored in the Set, when the Set performs deduplication operations, it will first determine whether the hashCode of the two objects is the same. The hashCode method is not rewritten, so the hashCode method in Object will be executed directly, and the hashCode method in Object compares two objects with different reference addresses, so the two obtained Hash values ​​are different, so the equals method will not It will be executed, and the two objects will be judged as not equal, so two identical objects will be inserted into the Set collection (the same here refers to the same content).

However, if the hashCode method is also rewritten when the equals method is rewritten, then the rewritten hashCode method will be executed when the judgment is executed. At this time, the comparison is whether the hashCode of all attributes of the two objects are the same, so call hashCode The returned result is true, then call the equals method and find that the two objects are indeed equal, so it returns true, so the Set collection will not store two identical data, so the execution of the entire program is normal.

Summarize

The two methods hashCode and equals are used to jointly judge whether two objects are equal. The reason for using this method is to improve the speed of program insertion and query. If hashCode is not rewritten when equals is rewritten, it will result in In some scenarios, for example, when two equal custom objects are stored in the Set collection, program execution exceptions will occur. In order to ensure the normal execution of the program, we need to rewrite equals together. The hashCode method will do.

Why doesn't HashMap directly use the hash value processed by hashCode() as the subscript of the table?

In theory, you can use the hash value processed by `hashCode()` directly as the subscript of the `table` array, but this approach may cause hash conflicts, that is, different keys may produce the same hash value , causing them to be mapped to the same `table` subscript position, which will affect the performance of HashMap.

To solve this problem, HashMap uses a solution called "zipper method". Specifically, each bucket in the HashMap is a linked list. When multiple keys are mapped to the same bucket, they will be stored in the linked list corresponding to the bucket. When searching, HashMap will first find the corresponding bucket according to the hash value of the key, and then look up the key-value pair in the linked list corresponding to the bucket.

Of course, if the linked list is too long, it will affect the performance of HashMap. Therefore, in JDK 8, when the length of the linked list exceeds a certain threshold (8 by default), HashMap will convert the linked list into a red-black tree to improve search efficiency. This approach can guarantee the performance of HashMap in most cases.

Why is the length of HashMap a power of 2

Because this can make the operation of calculating the bucket position more efficient.

Specifically, if the length of the HashMap is a power of 2, bit operations can be used instead of division operations when calculating bucket positions, thereby increasing the calculation speed.

We all know that in order to find the position of KEY in which slot of the hash table, we need to calculate hash(KEY) % array length

In order to achieve efficient access, HashMap must minimize collisions and distribute data evenly. How to distribute data evenly at this time mainly depends on the algorithm for storing data in the linked list. This algorithm is hash(KEY) & (length - 1) . & is a bitwise AND operation, which is a bit operation, and the efficiency of bit operation in the computer is very high, % calculation is much slower than &, which is why the % operation is not used. In order to ensure that the calculation result of & is equal to the result of %, it is necessary to put length minus 1.

hash(KEY) & (length - 1)=hash(KEY) %length

Since the two calculated hash values ​​are the same, of course it is more efficient to use &

I did a little experiment:

Assume now the length of the array: 2^14 = 16384

The hash value of String key = "zZ1!." is 115398910

public static void main(String[] args) {
        String key = "zZ1!.";
        System.out.println(key.hashCode());// 115398910
}

hash & (length - 1) = 115398910 & 16383 = 6398 (you can use the computer to verify whether it is correct) The binary of 6398 is 0001100011111110

hash % length = 115398910 % 16384 = 6398

This can greatly increase the calculation speed, because the bitwise AND operation is much faster than the modulo operation.

Also, power-of-two arrays handle hash collisions better. When using a linked list to resolve hash conflicts, if the length of the array is a power of 2, then the length of the linked list of each bucket is only 8 elements at most, which can ensure the search efficiency of the linked list. If the length of the array is not a power of 2, the number of buckets may not be evenly distributed, resulting in a long linked list of some buckets, which affects the search efficiency.

Therefore, in order to improve the performance of HashMap, it is generally recommended to set the length to a power of 2.

Sao Dai understands: If the length of the array is a power of 2, the length of the linked list of each bucket is only 8 elements at most, because the implementation of HashMap in JDK 1.8 uses a strategy called "tree-based", when a bucket When the length of the linked list in exceeds 8 elements, the linked list will be converted into a red-black tree to improve search efficiency.

The prerequisite for the tree operation is that the length of the linked list in the bucket exceeds the threshold (default is 8), and if the length of the array is a power of 2, then for any hash value, the bitwise AND operation between it and the length of the array The result must be less than the length of the array. For example, if the array length is 16, then its bitwise AND with 15 must result in a value between 0 and 15 for any hash value.

Therefore, if the length of the array is a power of 2, then for any hash value, the result of the bitwise AND operation with the length of the array must be smaller than the length of the array, that is to say, it must be used as the subscript of the bucket . And because the length of the array is a power of 2, the number of buckets is also a power of 2, so that the number of buckets and the length of the array can be guaranteed to be consistent, thus ensuring that the length of the linked list of each bucket is only 8 elements at most .

It should be noted that this situation is only for the implementation of HashMap in JDK 1.8. If you use other versions of HashMap or other hash table implementations, it may not have this property.

So why two perturbations?

Answer: This is to increase the randomness of the low bits of the hash value to make the distribution more uniform, thereby improving the randomness & uniformity of the subscript positions of the corresponding array storage, and finally reducing Hash conflicts. Twice is enough, and the high and low bits have been reached. The purpose of concurrently participating in calculations;

What is the difference between HashMap and HashTable?

  • Thread safety: HashMap is not thread-safe, and HashTable is thread-safe; the internal methods of HashTable are basically modified by synchronized. (If you want to ensure thread safety, use ConcurrentHashMap!);
  • Efficiency: Because of thread safety issues, HashMap is a little more efficient than HashTable. Also, HashTable is basically obsolete, don't use it in your code;
  • Support for Null key and Null value: In HashMap, null can be used as a key. There is only one such key, and there can be one or more keys whose corresponding value is null. But as long as there is a null in the put key value in HashTable, NullPointerException will be thrown directly.
  • Initial capacity and expansion mechanism: the initial capacity of HashTable is 11, and the initial capacity of HashMap is 16. When the number of elements in the HashTable exceeds 0.75 times the capacity, it will automatically expand to 2 times the original capacity. And when the number of elements in the HashMap exceeds 0.75 times the capacity, it will automatically expand to 2 times the original capacity, and the capacity after expansion must be the power of 2.
  • Underlying data structure: HashMap after JDK1.8 has undergone major changes in resolving hash conflicts. When the length of the linked list is greater than the threshold (8 by default), the linked list is converted into a red-black tree to reduce the search time. Hashtable has no such mechanism.

Sao Dai's understanding: As you can see in the class comments of Hashtable, Hashtable is a reserved class and is not recommended to be used. It is recommended to use HashMap in a single-threaded environment instead, and use ConcurrentHashMap instead if multi-threaded use is required.

How to decide to use HashMap or TreeMap?

When choosing HashMap or TreeMap, you need to decide according to specific needs and scenarios.

1. Search efficiency: If you mainly perform search operations and do not require high efficiency for insertion and deletion operations, you can choose TreeMap. Because TreeMap is internally implemented with a red-black tree, it can ensure that the elements are in order, and the search efficiency is higher than that of HashMap, and the time complexity is O(log n).

2. Insertion and deletion efficiency: If you mainly perform insertion and deletion operations, and the efficiency of search operations is not high, you can choose HashMap. Because HashMap is internally implemented by a hash table, it can ensure that the efficiency of insertion and deletion operations is higher than that of TreeMap, and the time complexity is O(1).

3. Memory usage: If you need to operate on a large amount of data and memory usage is a problem, you can choose HashMap. Because HashMap is internally implemented by a hash table, it can ensure that elements are stored in an array, and occupies less memory than TreeMap.

4. Element order: If you need to sort elements, you must choose TreeMap. Because TreeMap is internally implemented with a red-black tree, the order of elements can be guaranteed.

To sum up, if you need to perform search operations and have high requirements for the order of elements, you can choose TreeMap; if you need to perform insertion and deletion operations and have high requirements for memory usage, you can choose HashMap. If you need to take into account both search efficiency and insertion and deletion efficiency, you can use LinkedHashMap, which is implemented based on a hash table and a doubly linked list, which can ensure that elements are in order, and the efficiency of insertion and deletion operations is higher than that of TreeMap.

Knowledge charging station

  • The Key value of TreeMap<K,V> is required to implement java.lang.Comparable, so the TreeMap is sorted in ascending order by Key value by default when iterating; the implementation of TreeMap is based on the red-black tree structure. Suitable for traversing keys in natural or custom order.
  • The Key value of HashMap<K,V> implements hashCode(), the distribution is hashed and uniform, and sorting is not supported; the data structure is mainly bucket (array), linked list or red-black tree. Suitable for inserting, deleting and positioning elements in the Map.

Is HashMap thread safe? why?

HashMap is not thread safe!

Thread insecurity is reflected in JDK.1.7. In the case of multi-thread expansion, infinite loops or data loss may occur. The main reason is that the header insertion method used in the transfer method of expansion will reverse the order of the linked list. , which is the key to causing an infinite loop. In JDK1.8, expansion in the case of multi-threading may cause data coverage. For example, two threads A and B are performing put operations (the put operation on HashMap is actually calling the putVal() method) , and the insertion subscripts calculated by the hash functions of these two threads are the same, when thread A executes a sentence in the putVal() method for judging whether there is a hash collision (line 15 of the source code below), It is suspended due to the exhaustion of the time slice (note that the judgment result of thread A at this time is that there is no hash collision, and the judgment result is saved, but the next sentence of inserting element code has not been executed, so it is suspended at this time), while the thread After B gets the time slice, it also calls the putVal() method to insert elements. Since the subscripts calculated by the hash function of the two threads are the same, and the previous thread A is suspended because the time slice is up before it has time to insert elements, so this At that time, the judgment result of thread B also has no hash collision, and inserts the element directly at the subscript, completing the normal insertion, and then thread A obtains the time slice. Since the judgment of hash collision has been made before, it will not proceed at this time. Judgment, but insert directly, which causes the data inserted by thread B to be overwritten by thread A, so HashMap is not thread-safe.

reason:

  • HashMap thread insecurity in JDK1.7 is reflected in multi-threaded expansion leading to infinite loop and data loss
void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        while(null != e) {
            Entry<K,V> next = e.next;
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            int i = indexFor(e.hash, newCapacity);
            e.next = newTable[i];  //线程A执行完这句后因为时间片耗尽就被挂起了
            newTable[i] = e;
            e = next;
        }
    }
}

The expansion of HashMap is achieved by calling the above transfer method. After expansion, the elements must be transferred to the new array by using the head insertion method. The head insertion method will reverse the order of the linked list, which is also the key point of forming an infinite loop.

The above code mainly looks at the following four sentences of code

//重新定义下标
Entry<K,V> next = e.next;
//下面三句就是头插法的实现
 e.next = newTable[i];
 newTable[i] = e;
 e = next;

Simulating capacity expansion causes infinite loop and data loss

Assume that there are now two threads A and B performing expansion operations on the following HashMap at the same time:

The result after normal expansion is as follows:

But when thread A finishes executing the 10th line of code of the transfer function above, the CPU time slice is exhausted, and thread A is suspended. That is, as shown in the figure below:

Analysis: In thread A: e=3, next=7, e.next=null, you can see that the first one in the linked list in Figure 1 is 3, and the next one is 7, so e=3 in the first execution of the transfer function , next=7, originally according to the situation in Figure 1, it should be e.next=7, but thread A executes e.next = newTable[i]; in the transfer function, and newTable[i] is newly expanded So it is null, so after executing this sentence e.next =null

After thread A is suspended, thread B executes normally after getting the time slice, and completes the resize expansion operation. The results are as follows:

After thread B completes the expansion, the newTable and table in the main memory are the latest

That is to say, 7.next=3 and 3.next=null in the main memory (Here, I wondered why thread B was expanded after thread A got the time slice? In fact, thread A didn’t know it at all. Thread B has been expanded, and thread A does not care whether you have expanded or not. Anyway, it continues to execute the code that has not been executed before. Because it continues to expand based on the data stored in memory after thread B is expanded, thread A will The following data loss and infinite loop occur)

Then thread A gets the CPU time slice and continues to execute newTable[i] = e and e = next; these two lines of code, the data of thread A before suspending is e=3, next=7, e.next=null, execute this After two sentences, the result is newTable[3] = 3, e = 7, and the situation of thread A after executing this round of loop is as follows

Analysis: newTable[3] =3, e=7 (newTable[3] =3 here can be understood as a pointer, in fact, newTable[3] is an array, then its value is pointing to the node 3, e=7 e It can be understood as a pointer, from the original pointing to 3 to pointing to 7)

Then continue to execute the next cycle, at this time e=7, when reading e.next from the main memory, it is found that 7.next=3 in the main memory, at this time next=3, and put 7 in the way of head insertion into a new array, and continue to execute this round of loop

//此时e=7,内存中7.next=3、3.next=null,newTable[3] =3,e=7
//JMM中规定所有的变量都存储在主内存中每条线程都有自己的工作内存,
//线程的工作内存中保存了该线程所使用的变量的从主内存中拷贝的副本。
//线程对于变量的读、写都必须在工作内存中进行,而不能直接读、写主内存中的变量。
//同时,本线程的工作内存的变量也无法被其他线程直接访问,必须通过主内存完成。
Entry<K,V> next = e.next;------->next=7.next=3; 
//下面三句就是头插法的实现
 e.next = newTable[i];----------》e.next=3;//注意这里的e还是7,这句就是7的指针指向3
 newTable[i] = e; --------------》newTable[3]=e=7; 
 e = next;----------------------》e=next=3;

Note: JMM stipulates that all variables are stored in the main memory (Main Memory), each thread has its own working memory (Work Memory), and the working memory of the thread saves the variables used by the thread from the main memory. A copy of the copy in . Threads must read and write variables in the working memory, but cannot directly read and write variables in the main memory. At the same time, variables in the working memory of this thread cannot be directly accessed by other threads, and must be completed through the main memory.

Therefore, after the above thread B is executed, its working memory stores its own execution results 7.next=3 and 3.next=null, then the main memory is synchronized, and finally thread A goes to the main memory and copies it to its own working memory

After executing this round of loop, the result is as follows

Note: When e=3, 3.next=null This is the result in the main memory after thread B executes

Analysis: The only thing to pay attention to here is that when e.next = newTable[i]; is executed above, the result is e.next=3; and e is 7 not 3! This sentence is to connect 7 and 3, from 7 to 3. Here I used e.next=3; as the data basis for the next cycle at the beginning, and then I felt that it should be the situation in the diagram below. In fact, it should be e.next=null as the data basis when e=3

This is wrong!

Last round next=3, e=3, execute the next round and you can find that 3.next=null, so this round of round will be the last round of round.

//此时e=3,内存中next=3,newTable[3]=7,e=3,e.next=null
//任何一个线程的执行情况应该是放在内存中的,所以并发的时候才会出问题
//例如这个线程A去内存中拿的数据是线程B执行后的数据,已经不是线程A之前存的数据了
Entry<K,V> next = e.next;------->next=null; 
//下面三句就是头插法的实现
 e.next = newTable[i];----------》e.next=7;
 newTable[i] = e; --------------》newTable[3]=3; 
 e = next;----------------------》e=3;

After executing this round of loop, the result is as follows

When next=null is found after executing the above loop, the next round of loop will not be performed. At this point, the expansion operations of threads A and B are completed. Obviously, after the execution of thread A, a ring structure appears in the HashMap, and an infinite loop will appear when the HashMap is operated in the future. And from the figure above, we can see that element 5 was inexplicably lost during the expansion, which caused the problem of data loss.

  • HashMap thread insecurity in JDK1.8 is mainly reflected in data coverage

If two threads A and B are performing put operations (the put operation on HashMap is actually calling the putVal() method), and the insertion subscripts calculated by the hash functions of these two threads are the same, when thread A After executing a sentence in the putVal() method that is used to judge whether there is a hash collision (line 15 of the source code below), it is suspended due to the exhaustion of the time slice (note that the judgment result of thread A did not occur at this time) Hash collision, the judgment result is saved, but the next code to insert the element has not been executed, and it is suspended at this time), and thread B also calls the putVal() method to insert the element after getting the time slice, because the hash function of the two threads is calculated The output subscripts are the same, and the previous thread A is suspended because the time slice is up before it has time to insert the element, so at this time, the judgment result of thread B is that there is no hash collision, and the element is directly inserted at the subscript, and the completion The normal insertion is performed, and then thread A obtains the time slice. Since the judgment of hash collision has been made before, so the judgment will not be made at this time, but the insertion is performed directly, which causes the data inserted by thread B to be overwritten by thread A. , thus thread unsafe.

public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
 }


final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab;
	Node<K,V> p; 
	int n, i;
	//如果当前map中无数据,执行resize方法。并且返回n
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
	 //判断有没有发生hash碰撞,没有就直接插入
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
	//否则的话,说明这上面有元素
        else {
            Node<K,V> e; K k;
	    //如果这个元素的key与要插入的一样,那么就替换一下,也完事。
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
	    //1.如果当前节点是TreeNode类型的数据,执行putTreeVal方法
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
		//还是遍历这条链子上的数据,跟jdk7没什么区别
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
			//2.完成了操作后多做了一件事情,判断,并且可能执行treeifyBin方法
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null) //true || --
                    e.value = value;
		   //3.
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
	//判断阈值,决定是否扩容
        if (++size > threshold)
            resize();
	    //4.
        afterNodeInsertion(evict);
        return null;
    }

There are usually 4 ways to solve hash conflicts

1) The open addressing method, also known as the linear detection method , starts from the position where the conflict occurs, finds a free position from the hash table in a certain order, and then stores the conflicting element into this free position. ThreadLocal uses linear probing to resolve hash conflicts.

a situation like this

A key=name is stored at index 1 of the hash table. When key=hobby is added again, the index obtained by hash calculation is also 1, which is a hash conflict.

Using the open addressing method is to find a free location forward in order to store the conflicting key, that is, if there is no data stored in the right position of the conflicting position, then the conflicting position is stored, and if the data is stored, there is still a conflict Then continue to push a position to the right until the data can be stored. For example, after the above conflict, it should be placed at the position of index 2, and then there is no data at the position of index 2, so it will be stored directly. If there is data or conflict, then it will be placed at index 3 position, and so on, until the data is stored in the Hash table

(2) Chained addressing method , which is a very common method. A simple understanding is to store keys with hash conflicts in a one-way linked list. For example, HashMap is implemented using chained addressing method.

a situation like this

Conflicting keys are directly stored in a one-way linked list.

(3) Re-hash method , that is, when there is a conflict in the key calculated by a certain hash function, another hash function is used to hash the key, and the operation continues until no conflict occurs. This method will increase the calculation time and have a great impact on performance.

(4) To establish a common overflow area , the hash table is divided into two parts: the basic table and the overflow table, and all conflicting elements are put into the overflow table.

Supplement: In the JDK1.8 version, HashMap solves the hash conflict problem through chain addressing method + red-black tree. The red-black tree is to optimize the problem of increased time complexity caused by too long linked list of Hash table. When the length of the linked list is greater than 8 and the capacity of the hash table is greater than 64, adding elements to the linked list will trigger conversion.

HashMap is not thread-safe, what if you want to ensure thread safety? What can be used?

(1) Use HashTable (not recommended)

private Map map = new Hashtable<>()

It is found in the HashTable source code that its get/put methods are all modified by the synchronized keyword (this keyword modification means that this method has a synchronization lock, which is equivalent to when any thread runs to this method, first check whether there are other threads using this method) method, if there is one, you have to wait for the previous thread to finish using it before the current thread can use it)

Because of this, the thread safety of Hash Table is based on method-level blocking. They occupy shared resources, so only one thread can operate get or put at the same time, and the get and put operations cannot be executed at the same time, so this synchronous collection Very inefficient and generally not recommended to use this collection.

(2) Use SynchronizedMap (not recommended)

private Map map = Collections.synchronizedMap(newHashMap())

This is to directly use the method in the tool class to create the synchronizedMap object returned by synchronizedMapCollections.synchronizedMap(newHashMap()), and wrap the incoming HashMap object.

The implementation of this synchronization method is also relatively simple. It can be seen that the implementation of SynchronizedMap is to add an object lock (HashTable adds a synchronization lock, and SynchronizedMap adds an object lock). Every time you operate on HashMap, you must first obtain this mutex object. Enter, so the performance will not be better than HashTable, and it is not recommended to use it.

    public static <K,V> Map<K,V> synchronizedMap(Map<K,V> m) {
       return new SynchronizedMap<>(m);
  	 }
  	 
   private static class SynchronizedMap<K,V>
       implements Map<K,V>, Serializable {
       private static final long serialVersionUID = 1978198479659022715L;

       private final Map<K,V> m;     // Backing Map
       final Object      mutex;        // Object on which to synchronize

       SynchronizedMap(Map<K,V> m) {
           this.m = Objects.requireNonNull(m);
           mutex = this;
       }
       ...
   }
  
   	   //SynchronizedMap的put方法,实际调用的还是HashMap自己的put方法
      public V put(K key, V value) {
           synchronized (mutex) {return m.put(key, value);}
       }

(3) ConcurrentHashMap (recommended)

private Map map = new ConcurrentHashMap<>()

This implementation structure is the most complex, but it is also the most efficient and most recommended thread-safe Map, and each version is implemented in a different way. Jdk8 used segmented locking before, divided into 16 buckets, and only locked one of the buckets at a time, and jdk8 added red-black tree and CAS algorithm to achieve it.

ConcurrentHashMap

What is ConcurrentHashMap? What is the realization principle?

In a multi-threaded environment, data may be lost when using HashMap for put operations. In order to avoid the hidden danger of such bugs, it is strongly recommended to use ConcurrentHashMap instead of HashMap.

HashTable is a thread-safe class. It uses synchronized to lock the entire Hash table to achieve thread safety, that is, each time the entire table is locked to let the thread monopolize, it is equivalent to all threads competing for a lock when reading and writing. resulting in very low efficiency.

ConcurrentHashMap can read data without locking, and its internal structure allows it to keep the granularity of locks as small as possible when writing operations, allowing multiple modification operations to proceed concurrently. The key is to use locks Segmentation technology. It uses multiple locks to control modifications to different parts of the hash table. For the implementation of JDK1.7 version, ConcurrentHashMap internally uses segments (Segment) to represent these different parts. Each segment is actually a small Hashtable with its own lock. Multiple modification operations can proceed concurrently as long as they occur on different segments. The implementation of JDK1.8 reduces the granularity of locks. The granularity of JDK1.7 locks is based on Segment and contains multiple HashEntry, while the granularity of JDK1.8 locks is HashEntry (the first node).

Realization principle

JDK1.7

The ConcurrentHashMap in JDK1.7 is composed of a Segment array structure and a linked list array HashEntry structure, that is, ConcurrentHashMap divides the hash bucket array into multiple segments, and each Segment is composed of n linked list arrays (HashEntry), that is, through segmentation lock to achieve.

The process of ConcurrentHashMap locating an element requires two Hash operations. The first Hash locates the Segment, and the second Hash locates the head of the linked list where the element is located. Therefore, the side effect of this structure is the Hash process. It is longer than ordinary HashMap, but the advantage is that when writing operations, you can only operate on the Segment where the element is located, and will not affect other Segments. In this way, in the best case, ConcurrentHashMap can be the highest At the same time, it supports write operations of the size of the Segment (it happens that these write operations are very evenly distributed on all the Segments), so, through this structure, the concurrency capability of ConcurrentHashMap can be greatly improved

As shown in the figure below, the data is first divided into segments for storage, and then a lock is assigned to each segment of data. When a thread occupies a lock to access one segment of data, the data of other segments can also be accessed by other segments. Thread access realizes real concurrent access.

Segment is an internal class of ConcurrentHashMap, the main components are as follows:

Segment inherits ReentrantLock, so Segment is a reentrant lock that acts as a lock. Segment is 16 by default, that is, the concurrency is 16.

HashEntry, which stores elements, is also a static inner class, and its main components are as follows:

Among them, the data value of HashEntry and the next node next are modified with volatile to ensure the visibility of data acquisition in a multi-threaded environment!

Why use secondary hash

The main reason is to construct a separate lock so that the modification of the map will not lock the entire container and improve the concurrency capability. Of course, nothing is absolutely perfect. The problem caused by the second hash is that the entire hash process is longer than the single hash of the hashmap. Therefore, if it is not a concurrent situation, do not use concurrentHashmap.

Before JAVA7, ConcurrentHashMap mainly used the segmentation lock mechanism. When operating on a Segment, the Segment was locked, and no non-query operations were allowed on it. However, after JAVA8, the CAS lock-free algorithm was adopted. This optimistic operation is performed before completion. Make a judgment, and execute it only if it meets the expected results, providing good optimization for concurrent operations.

JDK1.8

In terms of data structure, ConcurrentHashMap in JDK1.8 chooses the same Node array + linked list + red-black tree structure as HashMap; in terms of lock implementation, the original Segment segment lock is abandoned, and CAS + synchronized is used to achieve more detailed Granular locks.

The lock level is controlled at the level of finer-grained hash bucket array elements, that is to say, only the head node of the linked list or the root node of the red-black tree needs to be locked, and the reading of other hash bucket array elements will not be affected. Write, greatly improving the concurrency.

What is the use of final and volatile decoration for variables in ConcurrentHashMap?

In ConcurrentHashMap, using final and volatile to modify variables can improve thread safety and visibility. The specific functions are as follows: 

1. final: A variable modified with final represents an immutable variable, that is, the value of the variable cannot be modified after initialization. In ConcurrentHashMap, using final to modify the key and hash variables in the Node class can ensure that they will not be modified after initialization, thus avoiding competition and conflicts between multiple threads. Final modified variables can ensure that variables can be accessed and shared without synchronization

2. volatile: A variable modified with volatile means that the variable is visible, that is, the modification of the variable will be immediately perceived by other threads. In ConcurrentHashMap, the count and modCount variables in the Segment class are decorated with volatile to ensure that their modifications are visible to other threads, thereby avoiding conflicts between threads and data inconsistencies.

It should be noted that although using final and volatile to modify variables can improve thread safety and visibility, thread safety and correctness cannot be fully guaranteed. When using ConcurrentHashMap, you also need to pay attention to other thread safety issues, such as the use of iterators, the atomicity of composite operations, and so on.

Why use synchronized instead of ReentrantLock in CocurrentHashMap of JDK1.8?

In JDK1.8, ConcurrentHashMap internally optimizes the implementation of locks, using CAS operations and synchronized keywords to replace the original reentrant lock ReentrantLock, the main reasons are as follows:

1. Performance optimization: Using the synchronized keyword can avoid the performance loss caused by the CAS operation in ReentrantLock, thereby improving concurrency performance.

2. Improve throughput: ConcurrentHashMap is a highly concurrent hash table implementation class that needs to support multiple threads for simultaneous read and write operations. Using the synchronized keyword can reduce lock competition and conflicts, thereby improving throughput and concurrent performance.

3. Simplify the code: Using the synchronized keyword can avoid complex operations that require manual locking and unlocking in ReentrantLock, thus simplifying the code implementation. This can reduce the complexity of the code and the possibility of errors, and improve the maintainability and readability of the code.

It should be noted that although ConcurrentHashMap uses the synchronized keyword to replace ReentrantLock in JDK1.8, it is not synchronized in the traditional sense, but uses lock segmentation technology to divide the entire hash table into multiple segments , each segment has an independent lock, which enables efficient concurrent access. This lock segmentation technology can avoid the problem of too large or too small lock granularity, thereby improving concurrency performance and scalability.

Sao Dai understands: The performance of using synchronized is better in competitive and non-competitive situations. Because the synchronized keyword can be automatically upgraded to optimized locks such as lightweight locks and spin locks, the overhead of thread blocking and switching is avoided, thereby improving concurrency performance. ReentrantLock, on the other hand, requires manual locking and unlocking, and complex operations such as CAS operations, resulting in performance loss

Can we use CocurrentHashMap instead of Hashtable?

We know that Hashtable is synchronized, but ConcurrentHashMap has better synchronization performance because it only locks part of the map according to the synchronization level. ConcurrentHashMap can of course replace HashTable, but HashTable provides stronger thread safety. They can all be used in a multi-threaded environment, but when the size of Hashtable increases to a certain level, the performance will drop sharply, because it needs to be locked for a long time during iteration. Because ConcurrentHashMap introduces segmentation (segmentation), no matter how large it becomes, only a certain part of the map needs to be locked, and other threads do not need to wait until the iteration is complete to access the map. In short, during iteration, ConcurrentHashMap only locks a certain part of the map, while Hashtable locks the entire map.

Are there any advantages and disadvantages of ConcurrentHashMap?

advantage

1. High concurrency performance: ConcurrentHashMap internally uses lock segmentation technology to divide the entire hash table into multiple segments, and each segment has an independent lock, thus achieving efficient concurrent access. In the case of concurrent access by multiple threads, ConcurrentHashMap can provide high concurrent performance and throughput.

2. Thread safety: ConcurrentHashMap is a thread-safe hash table implementation class that can support multiple threads to read and write operations at the same time without additional synchronization measures. In the case of concurrent access by multiple threads, ConcurrentHashMap can guarantee data consistency and correctness.

3. Scalability: ConcurrentHashMap internally uses lock segmentation technology, which can adjust the granularity and quantity of locks according to actual needs, so as to achieve scalability. In the case of concurrent access by multiple threads, ConcurrentHashMap can dynamically adjust the number and size of locks according to the actual concurrency situation to provide better concurrency performance.

4. Efficient iterator: The iterator inside ConcurrentHashMap is a fast-fail iterator, which can detect the modification of the hash table by other threads during the iteration process, thus avoiding data inconsistency and errors. In the case of concurrent access by multiple threads, ConcurrentHashMap can provide efficient iterator operations to support data traversal and access.

5. Customizability: ConcurrentHashMap provides many customizable parameters and configuration options, and parameters such as the size of the hash table, load factor, and concurrency level can be adjusted according to actual needs to achieve better performance and scalability.

shortcoming

1. Memory usage: ConcurrentHashMap needs to maintain multiple segments and multiple linked lists to support highly concurrent read and write operations, resulting in high memory usage. In the case of a large amount of data, it may cause problems such as memory overflow.

2. Disorder: ConcurrentHashMap internally uses a hash table to store data. The hash table is characterized by disorder, so the order of data cannot be guaranteed. If the data needs to be accessed in a certain order, additional sorting operations are required.

3. Weak consistency of the iterator: The iterator inside ConcurrentHashMap is weakly consistent, that is, during the iteration process, if other threads modify the hash table, the iterator will throw a ConcurrentModificationException. Therefore, when using iterators to access ConcurrentHashMap, you need to pay attention to thread safety and exception handling.

4. Expansion cost: ConcurrentHashMap needs to be expanded internally to support more data storage and higher concurrency performance. However, the expansion operation requires complex operations such as data migration and rebuilding the hash table, which may lead to performance degradation and increased latency.

ConcurrentHashMap difference between JDK 7 and 8

ConcurrentHashMap has been improved and optimized in both JDK 7 and JDK 8. The main differences are as follows:

1. Data structure: ConcurrentHashMap in JDK 7 internally uses segment lock technology to achieve concurrent access. Each segment is an independent hash table that can be read and written independently. ConcurrentHashMap in JDK 8 uses CAS operation and linked list/red-black tree structure to achieve concurrent access, which can better support high concurrency and large-scale data.

2. Concurrent performance: ConcurrentHashMap in JDK 8 has improved concurrency performance, mainly because it uses CAS operations and linked list/red-black tree structures to achieve concurrent access, reducing lock competition and overhead, thereby improving concurrency performance. JDK1.7 uses Segment's segmentation lock mechanism to achieve thread safety, in which Segment inherits from ReentrantLock. JDK1.8 uses CAS+synchronized to ensure thread safety.

3. Memory usage: ConcurrentHashMap in JDK 8 has improved in terms of memory usage, mainly because it uses a linked list/red-black tree structure to store data, which can make better use of memory space and reduce memory usage.

4. Expansion strategy: The ConcurrentHashMap in JDK 8 has improved in terms of expansion strategy, mainly because it uses a linked list/red-black tree structure to store data, which can better support concurrent expansion, reduce data migration and rebuild hash table overhead.

5. API interface: ConcurrentHashMap in JDK 8 adds some new API interfaces, such as forEach, reduce, search, etc., which can more conveniently traverse and operate the hash table.

6. The granularity of the lock: JDK1.7 is to lock the Segment that needs to perform data operations, and JDK1.8 is adjusted to lock the root node of the red-black tree or the head node of the linked list

What is the concurrency of ConcurrentHashMap in Java?

Concurrency can be understood as the maximum number of threads that can update ConccurentHashMap at the same time when the program is running without lock competition. In JDK1.7, it is actually the number of segment locks in ConcurrentHashMap, that is, the array length of Segment[], the default is 16, and this value can be set in the constructor.

If you set the concurrency degree yourself, ConcurrentHashMap will use the smallest power of 2 greater than or equal to the value as the actual concurrency degree, that is, if you set the value to 17 (24<17<25), then the actual concurrency degree is 32( 25).

If the concurrency setting is too small, it will cause serious lock competition problems; if the concurrency setting is too high, the access originally located in the same segment will spread to different segments, and the CPU cache hit rate will drop, causing Program performance degrades.

In JDK1.8, the concept of Segment has been abandoned, and the Node array + linked list + red-black tree structure has been chosen, and the concurrency depends on the size of the array.

Is ConcurrentHashMap iterator strong consistency or weak consistency?

ConcurrentHashMap's iterators are weakly consistent, not strongly consistent.

A weakly consistent iterator means that the iterator can see the modification of the hash table by other threads during the traversal process, but the consistency and correctness of the traversal results cannot be guaranteed. If other threads modify the hash table during traversal, it may cause the iterator to throw a ConcurrentModificationException, or traverse to duplicate or missing elements.

This is because ConcurrentHashMap internally uses segment lock technology to achieve concurrent access, and each segment has an independent lock, which can be read and written independently. During traversal, if other threads modify the same segment, it may cause the iterator to traverse inconsistent elements. Therefore, the iterator of ConcurrentHashMap can only guarantee final consistency, not strong consistency.

In order to avoid iterator exceptions and errors, locks or other synchronization measures can be used during traversal. You can also use ConcurrentHashMap's new API interfaces forEach, reduce, search, etc., which use a safer and more reliable traversal method to better avoid iterator exceptions and errors.

Why does ConcurrentHashMap not support key or value is null?

Because ConcurrentHashMap is used for multi-threading, if ConcurrentHashMap.get(key) gets null, there will be ambiguity, there may be no such key or there may be such a key but the corresponding value is null

So why doesn't HashMap have this ambiguity?

You can call the containsKey method to judge the ambiguity. If there is no key, it will return false. If there is this key but the corresponding value is null, it will return true. You can use this method to distinguish the ambiguity mentioned above. Note that the containsKey method HashMap and There are both in ConcurrentHashMap. The reason why one is ambiguous and the other is not ambiguous is mainly because one is single-threaded and the other is multi-threaded.

public boolean containsKey(Object key) {
    return getNode(hash(key), key) != null;
}

Why ConcurrentHashMap can't solve the ambiguity problem

Because ConcurrentHashMap is thread-safe, it is generally used in a concurrent environment. After you get null at the beginning of the get method, and then call the containsKey method, there is no way to ensure that there is no other thread to make trouble between the get method and the containsKey method. The key you want to query is set or deleted.

What is the difference between SynchronizedMap and ConcurrentHashMap?

Both SynchronizedMap and ConcurrentHashMap are thread-safe Map implementation classes, but there are the following differences between them:

1. Data structure: SynchronizedMap is implemented based on Hashtable, while ConcurrentHashMap is implemented based on Hashtable.

2. Concurrent performance: ConcurrentHashMap is better than SynchronizedMap in terms of concurrency performance, because it uses segmented lock technology to achieve concurrent access, and can perform read and write operations at the same time, while SynchronizedMap needs to use synchronization locks to ensure thread safety. Poor performance in concurrent environments.

3. Lock granularity: ConcurrentHashMap has a finer lock granularity and can support higher concurrency, while SynchronizedMap has a coarser lock granularity and can only support access by a single thread.

4. Expansion strategy: The expansion strategy of ConcurrentHashMap is better and can be expanded without affecting the concurrency performance, while the expansion strategy of SynchronizedMap needs to use synchronization locks to ensure thread safety, which may affect concurrency performance.

5. Null value: ConcurrentHashMap does not support the key or value being null, while SynchronizedMap allows the key or value to be null.

To sum up, although SynchronizedMap and ConcurrentHashMap are both thread-safe Map implementation classes, there are differences in concurrency performance, lock granularity, expansion strategy, and null value. Therefore, in actual development, it is necessary to choose the appropriate one according to the specific application scenario. Implementation class. If you need to support high concurrency, large-scale data and null values, it is recommended to use ConcurrentHashMap, and if the amount of data is small, you can use SynchronizedMap.

The difference between HashMap and ConcurrentHashMap

Both HashMap and ConcurrentHashMap are Map collections implemented by hash tables, but there are the following differences between them:

1. Thread safety: HashMap is not thread-safe, while ConcurrentHashMap is thread-safe. ConcurrentHashMap uses segmented lock technology to achieve concurrent access, and can perform read and write operations at the same time, while HashMap needs to use synchronization locks to ensure thread safety, so its performance is poor in high-concurrency environments.

2. Concurrent performance: ConcurrentHashMap is better than HashMap in terms of concurrency performance, because it uses segmented lock technology to achieve concurrent access, and can perform read and write operations at the same time, while HashMap needs to use synchronization locks to ensure thread safety. Poor performance in concurrent environments.

3. Lock granularity: ConcurrentHashMap has a finer lock granularity and can support higher concurrency, while HashMap has a coarser lock granularity and can only support access by a single thread.

4. Expansion strategy: The expansion strategy of ConcurrentHashMap is better, and can be expanded without affecting the concurrent performance, while the expansion strategy of HashMap needs to use synchronization locks to ensure thread safety, which may affect the concurrent performance.

5. Null value: ConcurrentHashMap does not support the key or value being null, while HashMap allows the key or value to be null.

To sum up, although HashMap and ConcurrentHashMap are both Map collections implemented by hash tables, there are differences in thread safety, concurrency performance, lock granularity, expansion strategy, and null value. Select the appropriate implementation class for the application scenario. If you need to support high concurrency and large-scale data, it is recommended to use ConcurrentHashMap. If the amount of data is small, you can use HashMap.

The difference between ConcurrentHashMap and Hashtable?

Both ConcurrentHashMap and Hashtable are thread-safe hash table implementation classes, but there are the following differences between them:

1. Data structure: ConcurrentHashMap is implemented based on hash table, and Hashtable is implemented based on synchronization method.

2. Concurrent performance: ConcurrentHashMap is better than Hashtable in terms of concurrency performance, because it uses segmented lock technology to achieve concurrent access, and can perform read and write operations at the same time, while Hashtable needs to use synchronization locks to ensure thread safety. Poor performance in concurrent environments.

3. Lock granularity: ConcurrentHashMap has a finer lock granularity and can support higher concurrency, while Hashtable has a coarser lock granularity and can only support access by a single thread.

4. Expansion strategy: The expansion strategy of ConcurrentHashMap is better and can be expanded without affecting the concurrency performance, while the expansion strategy of Hashtable needs to use synchronization locks to ensure thread safety, which may affect concurrency performance.

5. Null value: ConcurrentHashMap does not support null key or value, while Hashtable allows null key or value.

6. Iterator: The iterator of ConcurrentHashMap is weakly consistent and can be modified concurrently during the iteration process, while the iterator of Hashtable is strongly consistent and does not allow concurrent modification during the iteration process.

To sum up, although ConcurrentHashMap and Hashtable are both thread-safe hash table implementation classes, there are differences in thread safety, concurrency performance, lock granularity, expansion strategy, null value and iterator, etc., so in actual development It is necessary to select the appropriate implementation class according to the specific application scenario. If you need to support high concurrency, large-scale data and null values, it is recommended to use ConcurrentHashMap, and if the amount of data is small, you can use Hashtable.

A comparison chart of the two: HashTable:

JDK1.7的ConcurrentHashMap:

ConcurrentHashMap of JDK1.8 (TreeBin: red and black binary tree node Node: linked list node):

ConcurrentHashMap combines the advantages of both HashMap and HashTable. HashMap does not consider synchronization, and HashTable considers synchronization. But HashTable locks the entire structure every time it is executed synchronously. The way ConcurrentHashMap locks is slightly fine-grained.

Guess you like

Origin blog.csdn.net/qq_50954361/article/details/131375108