Java collection framework interview treasure-level information (super detailed)

1.1 Overview

On the one hand, java, as an object-oriented language, operates things in the form of objects. In order to facilitate the operation of multiple objects, objects must be stored in containers such as collections.

On the other hand, commonly used arrays have some disadvantages in storing objects, and java collections are like a container, which can dynamically store references to multiple objects in this container

1.2 Composition

The collection mainly has two major interfaces, namely the collection interface.
The sub-interfaces under the collection interface of the map interface mainly include the list interface. The
commonly used implementation classes of the set interface list interface mainly include ArrayList, LinkedList, and the
commonly used implementation classes of the Vector set interface mainly include HashSet, The implementation classes of Map interfaces such as TreeSet and LinkedHashSet
mainly include: Hashtable, HashMap, TreeMap, etc.

1.3 Respective characteristics

1.3.1 List collection

The main feature of the List collection is that the elements are ordered. Using the List can precisely control the insertion position of each element. Users can directly access the elements in the List through the index, and can store the same elements.

Among them, both ArrayList and Vector store data based on the array structure. LinkedList is implemented based on a doubly linked list. Because the array uses an index value, it can quickly locate a certain element, so the efficiency of ArrayList search and traversal is very high, but because the array is in When performing deletion and insertion operations, if the position of the deletion and insertion is very high and there are many elements in the array, then after the deletion and insertion operation, all elements after the position need to be shifted backward or forward, and the efficiency will become very low.

Since LinkedList is implemented based on a doubly linked list, the nodes in the middle store pointers to the previous node and pointers to the next node. In this way, each node is linked, so that when we insert and delete , you only need to change the front and rear pointers of the corresponding node after finding the corresponding position, which greatly improves the efficiency of insertion and deletion. Of course, LinkedList also has its shortcomings. It is stored in a linked list structure, and the node has no index, so when searching for elements It needs to rely on the pointer to move and search on the entire linked list, and the speed is slow.

The difference between ArrayList and Vector is that Vector is thread-safe, so the efficiency of ArrayList is higher than that of Vector in all aspects, so we recommend using ArrayList when there are many queries. To consider thread safety, use Vector. When inserting and deleting operations are relatively Use LinkedList a lot.

1.3.2 Set collection

The main feature of the Set collection is that the elements cannot be repeated and have no subscripts. In the Set] collection, LinkedHashSet is an ordered collection, and HashSet and TreeSet are all unordered.

Among them, the HashSet collection is a typical implementation of the Set interface. The underlying layer of HashSet is implemented based on hashMap. Elements cannot be repeated and can only have one null. HashSet can be regarded as the key part of HashMap. In order to ensure the non-repetition of elements, when a new element is inserted , first judge the hashcode value of the element and compare it with the hashcode value of the element in the HashSet. If the hashcode value is different, insert it directly into the HashSet. If they are the same, then call the equal() method of the element to compare with the corresponding element. If If it returns true, it means that the two are the same element and no addition is allowed. If it returns false, a linked list is created at the current hashcode value, and the elements are stored in the linked list.

LinkedHashMap is used at the bottom of LInkedHashSet, and a doubly linked list is added to the structure of HashSet to record the insertion order of elements, so it is more efficient than HashSet in frequent traversal scenarios.

The characteristic of TreeSet is that the elements are ordered. The elements of TreeSet have two sorting methods, which are compared according to the natural order or according to the specified Comparator (comparator). The elements in the TreeSet must implement the Comparable interface and override the compareTo() method to specify the elements comparison rules.

1.3.3Map collection

The Map collection is a special collection based on key-value pairs. The key in the Map cannot be repeated, but the value can be repeated. The key can uniquely determine a value. The commonly used collections of Map include HashMap, HashTable

The difference between the two is that HashTable is thread-safe, because each method of HashTable is modified by synchronized, the key of HashTable cannot be null, and the key of HashMap can be null, because HashMap handles null when implementing the hash function at the bottom layer , will set the hash value of the key to 0.

In our development process, when we find that the stored data satisfies the characteristics of key-value pairs, we can consider using map collections to store data. In normal development, I found that hashmap interacts with the front-end and back-end data format json structure More similar, in many cases, it can be used as front-end and back-end data for conversion, which is called orm. We use hashtable when thread safety is required in multithreading, but in view of efficiency, it is recommended to use ConcurrentHashMap.

2.1 Part of the underlying implementation of ArrayList

The bottom layer of ArrayList is dynamically expanded based on the array. Since the length of the array is fixed after creation, ArrayList is optimized to achieve dynamic expansion.

2.1.1 Initialization

When we create ArrayList for the first time, if we use the no-argument constructor, an empty array will be assigned to the elementData array, (the array has not been created at this time) and then the array will be created and the array will be created when the element is added for the first time The capacity is expanded to 10. When using the constructor with parameters, if the parameter is 0, create an empty array and assign it to elementData as in the no-argument construction. If the parameter is not 0, it will create an Object array of the specified size and assign it to the elementData array. Why do you assign two empty arrays to the element array? The main reason is to distinguish whether the array was first created with a parameterless constructor or a parameterized constructor with a parameter of 0. If it is detected when adding an element for the first time The parameterless constructor expands the array to 10. Compared with before 1.8, when we create a parameterized constructor, we directly new an array with a specified capacity. When we create a lot of collections with a capacity of 0, a lot of them will be generated in memory. The space occupied by an array with a specified capacity affects performance. After optimization, directly setting the underlying array to an empty array saves memory and improves performance.

2.1.2 ArrayList expansion

The capacity expansion mechanism is triggered when the collection cannot hold elements. The capacity of the new expanded array is 1.5 times that of the original array. When performing capacity expansion, use the array copy method arraycopy() of the system class System to copy the elements of the original array to the new array. When deleting, the array copy method is also used to copy the array after the element to be deleted to the original previous one, and then set the last element to null and hand it over to gc for recycling. The insert method is implemented similarly to the above.

2.2 Part of the underlying implementation of HashMap

2.2.1 Structural differences of HashMap

HashMap is implemented using an array plus a linked list before 1.8, and an array plus a linked list plus a red-black tree after jdk1.8.

Before 1.8: After determining the storage location according to the hash (key), store data in the location in the form of a linked list. At this time, there are multiple data in the linked list at this location of the array, which is also called a bucket. The stored data is described by Entry, entry Contains the key, value, hash, next (pointing to the next entry) of the data.

In version 1.8: data is described by Node, and node also contains the key, value, hash, next (pointing to the next node) of the data. When the length of the linked list is greater than 8 and the length of the array is greater than 64, the linked list will be converted into a red-black tree , when the number of red-black tree nodes is less than 6, it is converted into a linked list.

2.2.2 Insert data

When inserting data, hashMap first judges whether the array is empty. If it is empty, first initialize the array with a capacity of 16 and then calculate the storage location through (n-1)&hash. If it is not empty, directly calculate the storage location through (n-1)&hash , judge whether there is data in the specified location, if not, store the node directly, if there is data, it means a hash collision, and then judge whether the key is equal, if it is equal, the new value will overwrite the old value and return the old value, if the key is not equal, judge whether It is a red-black tree node and whether to convert to a red-black tree, and to determine whether to expand and other operations.

2.2.3 Initialization

Initialization If no capacity is specified, an array of size 16 is created by default, and the load factor is 0.75.

If the capacity size is specified, it will create the nearest integer power of 2 greater than the specified capacity.

2.2.4 Hash function design

The hash function of jdk1.7 performs four shifts and four XORs, and the efficiency is lower than that of 1.8. The
hash function of jdk1.8 uses the hashCode() of the key and its lower 16 bits to perform an XOR operation (called a disturbance function)
. In general, the hash value can be dispersed as much as possible to reduce the probability of hash collision.
The purpose of the hash operation is to locate the location where the data is stored in the array. jdk1.8 uses n-1 operations to compare with the original hash value. And" operation to locate, which is equivalent to a more efficient modulo operation

2.2.5 Then why can't we directly use the hashCode() of the key as the hash value, but must be ^ (h >>> 16)?

Because if you directly use the hashCode() of the key as the hash value, hash collisions are easy to occur.

The perturbation function^ (h >>> 16) is used to confuse the high and low bits of the original hash code, so as to increase the randomness of the low bits. And the low bit is mixed with high bit information, so the high bit information is also used as the key information of the disturbance function

Guess you like

Origin blog.csdn.net/ILIKETANGBOHU/article/details/127205179