Java Collection FAQ (Completed)

1. What are the common collections?

Collection-related classes and interfaces are all in java.util, and are mainly divided into three types: List, Map, and Set.

Among them Collectionis the parent interface of collection List, Setwhich mainly has two sub-interfaces:

  • List: The stored elements are ordered and repeatable.
  • Set: The stored elements are not disordered and cannot be repeated.

MapIt is another interface, which is a collection of key-value pair mapping structures.

2.4 How to choose a collection?

We choose the appropriate collection mainly based on the characteristics of the collection.

For example: When you need to get the element value based on the key value, use the collection under the Map interface.

When you need to sort, choose TreeMap. When you don't need to sort, choose HashMap.

If you need to ensure thread safety, use ConcurrentHashMap

When you only need to store element values, choose a collection that implements the Collection interface.

When you need to ensure that the elements are unique, choose a collection that implements the Set interface, such as TreeSet or HashSet.

When there is no need to ensure that the elements are unique, choose one that implements the List interface, such as ArrayList or LinkedList.

When you need a queue, choose the collection under the Queue interface.

For single-ended queue, choose PriorityQueue; for double-ended queue, choose ArrayDeque.

2.What is the difference between ArrayList and LinkedList?

**(1)**The data structure is different

  • ArrayList is implemented based on arrays
  • LinkedList is implemented based on doubly linked list

(2)  In most cases, ArrayList is more conducive to searching, and LinkedList is more conducive to additions and deletions.

  • ArrayList is implemented based on arrays, get(int index) can be obtained directly through the array subscript, and the time complexity is O(1); LinkedList is implemented based on linked lists, and get(int index) needs to traverse the linked list, and the time complexity is O(n); Of course, a search like get(E element) requires traversing both collections, and the time complexity is O(n).

  • If the addition or deletion of ArrayList is at the end of the array, just insert or delete directly. However, if you insert at the middle position, you need to move the elements after the insertion position forward or backward, and may even trigger expansion; doubly linked list Insertion and deletion only need to change the pointing of the predecessor node, successor node and inserted node, and there is no need to move elements.

Note that there may be a trap here. LinkedList is more conducive to additions and deletions, which is reflected in the average step size, not in time complexity. The time complexity of both additions and deletions is O(n)

**(3)**Whether random access is supported

  • ArrayList is based on an array, so it can be searched based on subscripts and supports random access. Of course, it also implements the RandmoAccess interface. This interface is only used to identify whether random access is supported.
  • LinkedList is based on a linked list, so it cannot directly obtain elements based on serial numbers. It does not implement the RandmoAccess interface, and the tag does not support random access.

**(4)** Memory occupation. ArrayList is based on arrays and is a continuous memory space. LinkedList is based on linked lists and the memory space is discontinuous. They have some additional consumption in space occupation:

  • ArrayList is a predefined array, there may be empty memory space, and there is a certain amount of space waste.
  • Each node of LinkedList needs to store the predecessor and successor, so each node will take up more space.

3.Do you understand the expansion mechanism of ArrayList?

ArrayList is a collection based on arrays. The capacity of the array is determined when it is defined. If the array is full and you insert it again, the array will overflow. Therefore, when inserting, it will first check whether the capacity needs to be expanded. If the current capacity + 1 exceeds the array length, the capacity will be expanded.

The expansion of ArrayList is to create a new array 1.5 times larger and then copy the values ​​​​of the original array.

4. Do you understand fail-fast and fail-safe?

Fail-fast : Fail-fast is an error detection mechanism for Java collections

  • When using an iterator to traverse a collection object, if thread B modifies (adds, deletes, or modifies) the contents of the collection object during the traversal process of thread A, a Concurrent Modification Exception will be thrown.
  • modCount Principle: The iterator directly accesses the contents of the collection when traversing, and uses a variable during the traversal process  . A value that changes if the contents of the collection change while it is being traversed modCount. Whenever the iterator uses hashNext()/next() to traverse the next element, it will check whether the modCount variable is the expectedmodCount value. If so, it will return to the traversal; otherwise, an exception will be thrown and the traversal will be terminated.
  • Note: The throwing condition of this exception is that modCount is detected! =expectedmodCount condition. If the modCount value is modified when the collection changes and it happens to be set to the expectedmodCount value, the exception will not be thrown. Therefore, programming for concurrent operations cannot rely on whether this exception is thrown. This exception is only recommended for detecting bugs in concurrent modifications.
  • Scenario: The collection classes under the java.util package are all failure-fast and cannot be modified concurrently under multi-threads (modified during the iteration process), such as the ArrayList class.

fail-safe

  • Collection containers that use a fail-safe mechanism do not directly access the collection content during traversal. Instead, they first copy the original collection content and traverse the copied collection.
  • Principle: Since a copy of the original collection is traversed during iteration, modifications to the original collection during the traversal process cannot be detected by the iterator, so Concurrent Modification Exception will not be triggered.
  • Disadvantages: The advantage of copying content is that it avoids Concurrent Modification Exception, but similarly, the iterator cannot access the modified content, that is: the iterator traverses the collection copy obtained at the moment when it starts traversing. During the traversal, the original Modifications that occur to the collection are unknown to the iterator.
  • Scenario: Containers under the java.util.concurrent package are all fail-safe and can be used concurrently and modified in multiple threads, such as the CopyOnWriteArrayList class.
What is the difference between fail-fast and fail-safe iterators?
  • Fail-fast is performed directly on the container. During the traversal process, once the data in the container is found to be modified, a ConcurrentModificationException exception will be thrown immediately, causing the traversal to fail. Common containers that use the fail-fast method include HashMap and ArrayList.
  • The fail-safe traversal is based on a clone of the container. Therefore, modification of the content in the container does not affect traversal. Common containers traversed using fail-safe methods include ConcurrentHashMap and CopyOnWriteArrayList.

5. What is the underlying data structure of HashMap?

7 traversal methods and performance analysis of HashMap! (strongly recommended) (qq.com)

In JDK 7, HashMap consists of "array + linked list". The array is the main body of HashMap, and the linked list mainly exists to resolve hash conflicts.

In JDK 8, HashMap consists of "array + linked list + red-black tree". If the linked list is too long, it will seriously affect the performance of HashMap. The time complexity of red-black tree search is O(logn), while the linked list is a terrible O(n). Therefore, JDK 8 has further optimized the data structure and introduced red-black trees. Linked lists and red-black trees will be converted when certain conditions are met:

  • When the linked list exceeds 8 and the total amount of data exceeds 64, it will turn to a red-black tree.
  • Before converting the linked list into a red-black tree, it will be judged. If the length of the current array is less than 64, it will choose to expand the array first instead of converting it to a red-black tree to reduce search time.

6. Why is the threshold for converting a linked list into a red-black tree 8?

Ideally, random hash codes are used. The frequency of nodes in the container distributed in the hash bucket follows the Poisson distribution. According to the calculation formula of the Poisson distribution, a comparison table of the number and probability of elements in the bucket is calculated. You can see the linked list. The probability when the number of elements in the linked list is 8 is already very small, and it will be even less when there are more elements. Therefore, the original author chose 8 when selecting the number of elements in the linked list, which was based on probability statistics.

The size of red-black tree nodes is about twice the size of ordinary nodes, so converting to red-black tree sacrifices space for time. It is more of a cover-up strategy to ensure search efficiency in extreme situations.

Why should we choose 8 as the threshold? related to statistics. Ideally, using random hash codes, the nodes in the linked list conform to the Poisson distribution, and the probability of the number of nodes is decreasing. When the number of nodes is 8, the probability of occurrence is only 0.00000006.

As for the threshold for converting a red-black tree back to a linked list, why is it 6 instead of 8? This is because if this threshold is also set to 8, if a collision occurs and the node increase or decrease is just around 8, there will be continuous conversion between the linked list and the red-black tree, resulting in a waste of resources.

7. Why not use red-black trees directly when resolving hash conflicts? And choose to use linked list first and then convert to red-black tree?

Because the red-black tree needs to perform operations such as left rotation, right rotation, and color change to maintain balance, but the singly linked list does not.

When there are less than 8 elements, when performing query operations, the linked list structure can already guarantee query performance. When there are more than 8 elements, the search time complexity of the red-black tree is O(logn), while the linked list is O(n). At this time, the red-black tree is needed to speed up the query, but the efficiency of adding new nodes becomes slower.

Therefore, if a red-black tree structure is used from the beginning, there will be too few elements and the new addition efficiency will be relatively slow. This will undoubtedly be a waste of performance.

8. Why does the hash value need to be ANDed with length-1?

  • Modulo the hash value to the array length. Modulo operation is very expensive and not as fast as bit operation.
  • When length is always 2 raised to the nth power, h& (length-1) the operation is equivalent to taking modulo length, that is, h%length, but & is more efficient than %.

9. How is the storage index of key in HashMap calculated?

First calculate the hashcode value based on the key value, then calculate the hash value based on the hashcode, and finally calculate the storage location through hash&(length-1)

10. What is the default loading factor of HashMap? Why is it 0.75, not 0.6 or 0.8?

As a general rule, the default load factor (0.75) provides a good compromise between time and space cost.

Why does the loading factor of HashMap must be 0.75? Instead of 0.8, 0.6?

11. What are the ways to resolve hash conflicts? Which HashMap to use?

Methods to resolve Hash conflicts are:

  • Open addressing method: also called re-hash method. The basic idea is that if p=H(key) conflicts, then use p as the basis and hash again, p1=H(p). If p1 conflicts again, then Based on p1, and so on, until a hash address pi is found that does not conflict. Therefore, the length of the hash table required by the open addressing method must be greater than or equal to the elements that need to be stored, and because there is re-hash, the deleted node can only be marked, but the node cannot be actually deleted.
  • Rehash method: double hashing, multiple hashing, providing multiple different hash functions. When R1=H1(key1) conflicts, then calculate R2=H2(key1) until there is no conflict. Although this is less likely to cause accumulation, it increases the calculation time.
  • Chain address method: zipper method, elements with the same hash value form a singly linked list of synonyms, and the head pointer of the singly linked list is stored in the i-th unit of the hash table. Search, insertion and deletion are mainly in the synonym linked list. conduct. The linked list method is suitable for situations where insertions and deletions are frequent.
  • Establish a public overflow area: Divide the hash table into a public table and an overflow table. When overflow occurs, all overflow data will be placed in the overflow area.

The chain address method is used in HashMap.

Why is the length of the HashMap array a power of 2?

2 to the N power helps reduce the chance of collision. If length is a power of 2, length-1 must be converted into binary in the form of 11111..., and the binary AND operation with h will be very fast and the space will not be wasted. Let’s give an example, look at the picture below:

When length =15, the results of 6 and 7 are the same, which means that their storage locations in the table are the same, that is, a collision occurs. 6 and 7 will form a linked list at one location, and the results of 4 and 5 are also the same. This will result in reduced query speed.

If we analyze further, we will also find that the space is wasted very much. Taking length=15 as an example, there is no data stored in eight locations: 1, 3, 5, 7, 9, 11, 13, and 15. Because when the hash value is ANDed with 14 (i.e. 1110), the last digit of the result is always 0, that is, it is impossible to store data at the positions 0001, 0011, 0101, 0111, 1001, 1011, 1101, and 1111. .

12. What is generally used as the key of HashMap?

Generally, immutable classes such as Integer and String are used as keys of HashMap, and String is the most common.

  • Because the string is immutable, the hashcode is cached when it is created and does not need to be recalculated.
  • Because the equals() and hashCode() methods are used when obtaining the object, it is very important for the key object to correctly override these two methods. Classes such as Integer and String have rewritten the hashCode() and equals() methods in a very standardized way.

13. Why is HashMap thread-unsafe?

  • In JDK 7, expansion under multi-threading will cause an infinite loop.
  • Multi-threaded put may cause elements to be lost.
  • When put and get are concurrent, get may be null.

14. What is the process of put method of HashMap?

Taking JDK 8 as an example, the brief process is as follows:

1. First, calculate the hash value based on the value of key and find the subscript of the element stored in the array;

2. If the array is empty, call resize for initialization;

3. If there is no hash conflict, place it directly in the corresponding array subscript;

4. If there is a conflict and the key already exists, the value will be overwritten;

5. If the node is found to be a red-black tree after a conflict, hang the node on the tree;

6. If the conflict is a linked list, determine whether the linked list is greater than 8. If it is greater than 8 and the array capacity is less than 64, expand it; if the linked list nodes are greater than 8 and the array capacity is greater than 64, convert this structure into a red-black tree; Otherwise, the key-value pair is inserted into the linked list, and if the key exists, the value is overwritten.

15.What is the difference between arrayList and linkedList? 

  • 1.ArrayList is based on arrays , and the storage space is continuous. LinkedList is based on linked list , and the storage space is discontinuous. (LinkedList is a doubly linked list)

  • 2. For random access  to get and set, ArrayList is better than LinkedList because LinkedList needs to move the pointer.

  • 3. For the new and deletion operations add and remove, LinedList has an advantage because ArrayList needs to move data.

  • 4. For the same amount of data, LinkedList may occupy less space, because ArrayList needs to reserve space for subsequent data addition, while LinkedList only needs to add one node to add data.

16. What is the difference between Collection and Collections?

  • Collection is a collection interface that provides common interface methods for basic operations on collection objects. All collections are its subclasses, such as List, Set, etc.
  • Collections is a wrapper class that contains many static methods and cannot be instantiated. Instead, it is used as a tool class, such as the provided sorting method: Collections.sort(list); the provided reversal method: Collections.reverse(list).

17. In HashSet, what is the relationship between equals and hashCode?

Both methods equals and hashCode are inherited from the object class. equals is mainly used to determine whether the memory address reference of the object is the same address; hashCode converts the memory address of the object into a hash according to the defined hash rules. code. The elements stored in HashSet cannot be repeated. The hashCode and equals methods are mainly used to determine whether the stored objects are the same:

  • If the hashCode values ​​of two objects are different, it means that the two objects are not the same.
  • If the hashCode values ​​of the two objects are the same, the equals method of the object will be called. If the return result of the equals method is true, then the two objects are the same, otherwise they are not the same.

18. What is the difference between HashMap and Hashtable?

  • Is it thread safe:
  • HashMap is not thread-safe, Hashtable is thread-safe, because the internal methods of Hashtable are basically synchronized. (If you want to ensure thread safety, use ConcurrentHashMap! Hashtable is obsolete);
  • efficiency:
  • Because of thread safety issues, HashMap is more efficient than Hashtable. In addition, Hashtable is basically obsolete, don't use it in your code;
  • Support for Null key and Null value:
  • HashMap can store null keys and values, but there can only be one null as a key and multiple null as values; Hashtable does not allow null keys and null values, otherwise a NullPointerException will be thrown.
  • The difference between the initial capacity size and each expansion capacity size:
  • If the initial capacity value is not specified when creating, the default initial size of Hashtable is 11. After each expansion, the capacity becomes the original 2n+1. The default initialization size of HashMap is 16. Each subsequent expansion will double the capacity. If an initial value of capacity is given when creating, Hashtable will directly use the size you gave, and HashMap will expand it to a power of 2.
  • Underlying data structure:
  • HashMap after JDK1.8 has undergone major changes in resolving hash conflicts. When the length of the linked list is greater than the threshold (default is 8), the linked list will be converted into a red-black tree (it will be judged before converting the linked list into a red-black tree. If the length of the current array is less than 64, it will choose to expand the array first instead of converting to a red-black tree) to reduce search time. Hashtable has no such mechanism. .

19. What are the similarities and differences between ArrayList, Vector and LinkedList?

20. Briefly describe Java’s Set

Set is a collection. This data structure does not allow duplicate and unordered elements. Java has three ways to implement Set:

HashSet is implemented through HashMap. The Key of HashMap is the element stored in HashSet. The Value system defines an Object type constant named PRESENT. When judging whether the elements are the same, first compare the hashCode, and then use equals to compare if they are the same. The query is O(1)

LinkedHashSet inherits from HashSet, is implemented through LinkedHashMap, and uses a doubly linked list to maintain the insertion order of elements.

TreeSet is implemented through TreeMap. The underlying data structure is a red-black tree. When adding elements to the set, it is inserted into the appropriate position according to the comparison rules to ensure that the inserted set is still ordered. Query O(logn)

21. Briefly describe Java’s HashMap

Before JDK8, the underlying implementation was array + linked list, but JDK8 changed it to array + linked list/red-black tree. The main member variables include the table array to store data, the number of elements size, and the load factor loadFactor. Data in HashMap exists in the form of key-value pairs. The hash value corresponding to the key is used to calculate the array subscript. If the hash value of the key of two elements is the same, a hash conflict will occur and they will be placed on the same linked list.

The table array records HashMap data. Each subscript corresponds to a linked list. All hash conflict data will be stored in the same linked list. The Node/Entry node contains four member variables: key, value, next pointer and hash value. After JDK8, linked lists exceeding 8 will be converted into red-black trees.

If current data/total data capacity > load factor, Hashmap will perform expansion operation. The default initialization capacity is 16, the expansion capacity must be a power of 2, the maximum capacity is 1<< 30, and the default loading factor is 0.75.

22. Why is HashMap thread-unsafe?

HashMap is not thread-safe and these problems may occur:

  • Expansion infinite loop under multi-threading. HashMap in JDK1.7 uses the head insertion method to insert elements. In a multi-threaded environment, expansion may lead to the emergence of a circular linked list, forming an infinite loop. Therefore, JDK1.8 uses the tail insertion method to insert elements, which will maintain the original order of the linked list elements during expansion, and will not cause the problem of circular linked lists.

  • Multi-threaded put may cause elements to be lost. Multiple threads execute put operations at the same time. If the calculated index positions are the same, it will cause the previous key to be overwritten by the next key, resulting in the loss of elements. This problem exists in both JDK 1.7 and JDK 1.8.

  • When put and get are executed concurrently, get may be null. When thread 1 executes put, rehash occurs because the number of elements exceeds the threshold. Thread 2 executes get at this time, which may cause this problem. This problem exists in both JDK 1.7 and JDK 1.8.

23. Briefly describe Java’s TreeMap

TreeMap is a Map structure implemented using a red-black tree at the bottom. The underlying implementation is a balanced sorted binary tree. Since the insertion, deletion, and traversal time complexity of the red-black tree is O(logN), its performance is lower than that of a hash table. . However, the hash table cannot provide ordered output of key-value pairs, while the red-black tree can provide ordered output according to the size of the key value.

24. What are the similarities and differences between ArrayList, Vector and LinkedList?

  • ArrayList, Vector, and LinkedList are all scalable arrays, that is, arrays whose length can be changed dynamically.
  • ArrayList and Vector are both implemented based on the Object[] array that stores elements. They open up a continuous space in the memory for storage and support subscript and index access. But when it comes to inserting elements, you may need to move the elements in the container, and the insertion efficiency is low. When the stored elements exceed the initial capacity of the container, both ArrayList and Vector will expand.
  • Vector is thread-safe and most of its methods are directly or indirectly synchronized. ArrayList is not thread-safe and its methods are not synchronized. LinkedList is also not thread-safe.
  • LinkedList is implemented using a two-way list, and the data index needs to be traversed from the beginning, so the random access efficiency is low, but when inserting elements, the data does not need to be moved, and the insertion efficiency is high.

25. The difference between HashMap and HashSet

The bottom layer of HashSet is implemented based on HashMap.

Implement interfaces differently

HashMap implements the Map interface

HashSet implements the Set interface

Stored differently

HashMap stores key-value pairs

HashSet storage object

Adding in different ways

HashMap calls put() to add elements

HashSet calls add() to add elements

Calculate HashCode differently

HashMap uses key to calculate hashcode

HashSet uses member objects to calculate hashcode values. For two different objects, the hashcode may be the same, so the equals() method is also used to determine the equality of the objects.

26.What main optimizations has jdk1.8 made to HashMap? Why?

The HashMap of jdk1.8 has five main optimization points:

  1. Data structure : array + linked list changed to array + linked list or red-black tree

    原因O(n): When a hash conflict occurs, the elements will be stored in the linked list. If the linked list is too long, it will be converted into a red-black tree, reducing the time complexity fromO(logn)

  2. Linked list insertion method : The insertion method of linked list is changed from head insertion method to tail insertion method.

    To put it simply, when inserting, if there is already an element at the array position, 1.7 puts the new element into the array, and the original node is used as the successor node of the new node. 1.8 traverses the linked list and places the element at the end of the linked list.

    原因: Because when the 1.7 head plugging method is expanded, the head plugging method will invert the linked list, and a loop will occur in a multi-threaded environment.

  3. Expansion rehash : When expanding, 1.7 needs to rehash the elements in the original array and position them at the position of the new array. 1.8 uses simpler judgment logic and does not need to recalculate the position through the hash function. The new position remains unchanged or the index + Added capacity size.

    原因:Improve the efficiency of capacity expansion and expand capacity faster.

  4. Expansion timing : When inserting, 1.7 first determines whether expansion is needed before inserting, 1.8 inserts first, and then determines whether expansion is needed after the insertion is completed;

  5. Hash function : 1.7 does four shifts and four XORs, jdk1.8 only does it once.

    原因: If you do it 4 times, the marginal utility will not be great. Change it to once to improve efficiency.

27. What other construction methods of hash functions do you know?

The method of the hash constructor in HashMap is called:

  • Except for the remainder method : H (key) = key% p (p <= N), the keyword is divided by a positive integer p that is not larger than the length of the hash table, and the remainder is the address. Of course, HashMap has been optimized and transformed. More efficient and more balanced hashing.

In addition, there are several common hash function construction methods:

  • direct addressing method

    Directly keymap to the corresponding array position, for example, 1232 is placed at the subscript 1232 position.

  • digital analytics

    Take keycertain numbers (such as tens and hundreds) as the location of the mapping

  • Square-Medium Method

    Take keythe middle digits of the square as the mapping position

  • folding method

    Will keybe divided into several segments with the same number of digits, and then their superposition sum will be used as the mapping position

28. How much do you know about red-black trees? Why not use binary tree/balanced tree?

The red-black tree is essentially a binary search tree. In order to maintain balance, it adds some rules based on the binary search tree:

  1. Each node is either red or black;
  2. The root node is always black;
  3. All leaf nodes are black (note that the leaf nodes mentioned here are actually NULL nodes in the graph);
  4. The two child nodes of each red node must be black;
  5. The path from any node to every leaf node in its subtree contains the same number of black nodes;

The reason why binary trees are not used:

A red-black tree is a balanced binary tree. The worst-case time complexity of insertion, deletion, and search is O(logn), which avoids the worst-case O(n) time complexity of a binary tree.

Why not balance binary trees:

A balanced binary tree is a more strictly balanced tree than a red-black tree. In order to maintain balance, more rotations are required. This means that a balanced binary tree is less efficient at maintaining balance, so the insertion and deletion efficiency of a balanced binary tree is higher than that of a red-black tree. Low

Do you know how the red-black tree maintains balance?

Red-black trees maintain balance in two ways: 旋转and 染色.

  • Rotation: There are two types of rotation, left-hand and right-hand

 29.Are the internal nodes of HashMap ordered?

HashMap is unordered and randomly inserted based on hash value. If you want to use an ordered Map, you can use LinkedHashMap or TreeMap.

30. HashMap is not thread-safe and these problems may occur:

  • Expansion infinite loop under multi-threading. HashMap in JDK1.7 uses the head insertion method to insert elements. In a multi-threaded environment, expansion may lead to the emergence of a circular linked list, forming an infinite loop. Therefore, JDK1.8 uses the tail insertion method to insert elements, which will maintain the original order of the linked list elements during expansion, and will not cause the problem of circular linked lists.

  • Multi-threaded put may cause elements to be lost. Multiple threads execute put operations at the same time. If the calculated index positions are the same, it will cause the previous key to be overwritten by the next key, resulting in the loss of elements. This problem exists in both JDK 1.7 and JDK 1.8.

  • When put and get are executed concurrently, get may be null. When thread 1 executes put, rehash occurs because the number of elements exceeds the threshold. Thread 2 executes get at this time, which may cause this problem. This problem exists in both JDK 1.7 and JDK 1.8.

Guess you like

Origin blog.csdn.net/pachupingminku/article/details/132692497