Detailed explanation of Java collection (1) - Java full stack knowledge (6)

1. What collections are there in Java

The top-level interface under the java.util package is the Collection interface, and there are three interfaces under the Collection:

  • Set: Represents an unordered non-repeatable collection
  • List: Represents an ordered and repeatable collection
  • Queue: Indicates a first-in first-out queue

Common implementation classes include HashSet and TreeSet under the Set interface

The List interface contains ArrayList, LinkedList, Stack, etc.

Queue has PriorityQueue for priority queue, ArrayDeque, or queue LinedList implemented by linked list

Map represents a collection of (key-value) types, and there are HashMap and TreeMap under the interface.

2. What are the thread-safe collections?

Most of the collections under the java.util package are thread-unsafe collections, and there are also a few thread-safe collections such as: Hashtable and Vector, but these two collections are some old APIs, as can be seen from the names, even Hashtable does not Conforms to the Java naming convention. Although thread-safe, the efficiency is very poor compared to other commonly used collections. So if we need to use a thread-safe collection, we can use Collections to wrap HashMap into a thread-safe Map. The specific operation is as follows:

Map<String,String> unSafeMap = new HashMap<String,String>();
Map safeMap = Collections.synchronizedMap(unSafeMap);

Or you can use the corresponding collection under the java.util.concurrent package.

After jdk1.5, Java introduced the java.util.concurrent package to implement thread-safe implementation classes for commonly used collections: "

  • Collection classes starting with Concurrent: Collection classes starting with Concurrent represent collections that support concurrent access. They can support concurrent write access by multiple threads. All operations of these writing threads are thread-safe, but read operations do not have to locking. Collection classes starting with Concurrent use a more complex algorithm to ensure that the entire collection will never be locked, so they have better performance when writing concurrently.
  • Collection classes starting with CopyOnWrite: Collection classes starting with CopyOnWrite implement write operations by copying the underlying array. When a thread performs a read operation on such a collection, the thread will directly read the collection itself without locking or blocking. When a thread performs a write operation on such a collection, the collection will copy a new array at the bottom, and then perform a write operation on the new array. Since writes to a collection operate on a copy of the array, it is thread-safe.

3. The implementation class of the Map interface

There are four common implementation classes under the map interface: HashMap, TreeMap, ConcurrentHashMap, and LinkedHashMap.

Select the map collection to be used according to the usage:

  • If you just need a map collection without considering sorting and thread safety issues, HashMap is the best choice, because HashMap has the highest efficiency, and the time complexity of query, insertion and deletion is O(1).
  • If you consider thread safety and do not consider sorting issues, you can use ConcurrentHashMap, which is thread-safe through the mechanism of CAS or segment lock
  • If sorting is required and only natural insertion sorting is required, LinkedHashMap can be used.
  • If you need to customize the sorting rules, then use TreeMap. The bottom layer of TreeMap is implemented through red-black trees and supports sorting.
  • If you need sorting and thread safety, you can use the Collections tool class to construct a thread-safe collection.

4. The difference between TreeMap and HashMap

  1. HashMap is implemented based on hash table and linked list (before jdk1.8), based on hash table + linked list + red-black tree (starting from 1.8)
  2. HashMap is more efficient, because it is implemented based on a hash table, so the time complexity is O(1), TreeMap is implemented based on a red-black tree, and the query efficiency is O(log2 N)
  3. TreeMap is an ordered collection, and HashMap is unordered, determined by the underlying data structure.
  4. Neither is thread safe

5. The mechanism of HashMap

After jdk1.8, the underlying data structure of hashMap changed from hash table + linked list to hash table + linked list + red-black tree. This is because in the case of serious hash conflicts, the linked list behind each position of the hash table is prevented from being too long and the query efficiency becomes O(n). O(log2 N).

  1. The initial capacity of HashMap is 16, and the capacity will be checked when inserting. If the number of currently stored nodes is greater than the load factor (default 0.75) * maximum capacity (default 16), HashMap will expand with twice the capacity.
  2. In order to resolve hash conflicts, the elements in the hash table are stored in a one-way linked list. After the length of the linked list reaches a threshold (8 by default), the underlying linked list will be constructed as a red-black tree to improve performance. When we delete elements until the length of the linked list is 6. The red-black tree will be automatically converted into a linked list to improve performance.
  3. Before checking the length of the linked list and converting it into a red-black tree, it will also check whether the current array array reaches a threshold (64). If it does not reach this capacity, the conversion will be abandoned and the array will be expanded first.

5.1. Why do you need to double the size for expansion?

The group length becomes twice the original, which means that in binary, there is an extra high bit involved in the determination of the array subscript . At this time, after an element is calculated by the method of hash conversion coordinates, there happens to be a phenomenon: the highest bit is 0, the coordinates remain unchanged, and the highest bit is 1, the coordinates become "10000 + original coordinates", that is, "original length + original coordinates ".

5.2. Why are the thresholds 8 and 6 instead of greater than or equal to 8 and less than 8?

Because some HashMaps may frequently add and delete elements during use, causing the length of the linked list to frequently fluctuate between 7 and 8. If the threshold is 8 at this time, then red-black trees and linked lists will be frequently constructed. Then a lot of system resources will be wasted to convert the underlying data structure. The purpose of entering the red-black tree is to improve efficiency. At this time, the effect of improving efficiency cannot be achieved, but the efficiency will be reduced.

5.3. Why does HashMap use red-black tree instead of B-tree?

Because B/B+ trees are more suitable for storing large amounts of data. It is more suitable for use on external storage, and we introduce red-black trees to solve the problem of reduced query efficiency caused by too long linked lists. It is impossible to store a large number of elements in an array space of HashMap. When the number of elements is relatively small, the use of B/B+ tree will cause the data to be squeezed into one node, and the efficiency is the same as that of the linked list.

Guess you like

Origin blog.csdn.net/dghehe/article/details/130031795