Java interview questions sorting "collection"

Java Collections

A collection is a variable-length container for storing data. To be precise, it is a container for storing data object references. Collection classes store references to objects, not the objects themselves (which cannot store basic data types). Collection classes are stored in the Java.util package, and there are three main ones: set, list and map.

  • 1. Collection: Collection is the most basic interface of collection List, Set and Queue.
  • 2. Iterator: Iterator, which can traverse the data in the collection through the iterator
  • 3. Map: is the basic interface of the mapping table

insert image description here


difference between collection and array

  • Arrays are fixed length; collections are variable length.
  • Arrays can store primitive data types as well as reference data types; collections can only store reference data types.
  • Elements stored in an array must be of the same data type; objects stored in a collection can be of different data types.

What is the difference between Collection and Collections?

  • Collection is the most basic collection interface. Collection derives two sub-interfaces, list and set, which define two different storage methods.
  • Collections is a wrapper class that contains various static methods related to collection operations (searching, sorting, thread safety, etc. on collections). This class cannot be instantiated, just like a tool class that serves the Collection framework.

What is the difference between List, Set and Map?

  • List : The stored elements are ordered (objects are kept in the order of objects), repeatable, allowing multiple Null element objects, each with an index.
  • Set: The stored elements are unordered and non-repeatable, allowing at most one Null element object, which can only be traversed by iterator or foreach.
  • Map: stores elements in the form of key-value pairs (kye-value). Key is unordered and non-repeatable, and value is unordered and repeatable. Each key is mapped to at most one value. Map does not inherit from the Collection interface. When retrieving elements from the Map collection, as long as the key object is given, the corresponding value object will be returned.

List collection

  • Arraylist: Based on the dynamic Object[] array, sequential memory storage, suitable for subscript access (random access), suitable for frequent search work, the disadvantage of the array is that there can be no gap between each element, when the array When the size is not satisfied, the storage capacity needs to be increased, and the data of the existing array must be copied to the new storage space. When inserting or deleting elements from the middle of the ArrayList, the array needs to be copied, moved, and expensive. Therefore, it is suitable for random lookups and traversals, not for insertions and deletions.
  • Vector: Based on the dynamic Object[] array, it supports thread synchronization, that is, only one thread can write Vector at a certain time, avoiding the inconsistency caused by multiple threads writing at the same time, but the synchronization requires a high cost. Therefore, accessing it slower than accessing ArrayList
  • LinkedList: Doubly linked list (circular linked list before JDK1.6, JDK1.7 canceled the cycle), suitable for dynamic insertion and deletion of data, random access and traversal speed is relatively slow. In addition, it also provides methods that are not defined in the List interface, which are specially used to operate the header and footer elements, which can be used as stacks, queues and bidirectional queues.

Set collection

  • HashSet (unordered, unique): The bottom layer of HashSet is implemented based on HashMap. The value of HashSet is stored on the key of HashMap, and the value of HashMap is unified as present. Therefore, the implementation of HashSet is relatively simple. Except for clone() , writeObject() , and readObject() which HashSet has to implement by itself, other methods are directly calling the methods in HashMap.
  • LinkedHashSet: LinkedHashSet is a subclass of HashSet, and its internal implementation is through LinkedHashMap. It is a bit similar to the LinkedHashMap we said earlier, which is internally implemented based on HashMap, but there is still a little difference.
  • TreeSet (ordered, unique): red-black tree (self-balancing sorted binary tree) Based on the principle of binary tree, the objects of the new add() are sorted in the specified order (ascending, descending), and each additional object will be sorted, the The specified position in the binary tree where the object is inserted. (Integer and String objects can be sorted by default, but objects of custom classes are not allowed, they must implement the Comparable interface and override the corresponding compareTo() function)

Map collection

  • HashMap: Before JDK1.8, HashMap consisted of array + linked list. Array is the main body of HashMap, and linked list exists mainly to resolve hash conflicts. After JDK1.8, there have been major changes in resolving hash conflicts. When the length of the linked list is greater than the threshold (the default is 8) (it will be judged before converting the linked list into a red black tree, if the length of the current array is less than 64 , then you will choose to expand the array first, instead of converting to a red-black tree), convert the linked list into a red-black tree to reduce the search time
  • LinkedHashMap: LinkedHashMap inherits from HashMap, so its bottom layer is still based on the zipper hash structure, which is composed of arrays and linked lists or red ⊊ trees. In addition, LinkedHashMap adds a doubly linked list based on the above structure, so that the above structure can maintain the insertion order of key-value pairs. At the same time, by performing corresponding operations on the linked list, the access sequence related logic is realized.
  • Hashtable: composed of array + linked list, the array is the main body of HashMap, and the linked list exists mainly to resolve hash conflicts
  • TreeMap : Red black tree (self-balancing sorted binary tree)

Difference between Arraylist and LinkedList?

  • The underlying data structure: The bottom layer of Arraylist uses the Object array; the bottom layer of LinkedList uses the doubly linked list data structure (before JDK1.6, it was a circular linked list, and JDK1.7 canceled the cycle.)
  • Whether thread safety is guaranteed: ArrayList and LinkedList are not synchronized, and thread safety is not guaranteed;
  • Whether insertion and deletion are affected by element position:
    ① ArrayList uses array storage, so the time complexity of element insertion and deletion is affected by element position. When the add(E e) method is executed, the ArrayList will append the specified element to the end of the list by default. In this case, the time complexity is O(1), but it may cause expansion. But if you want to insert and delete elements at the specified position i, the time complexity is O(ni). Because when the above operations are performed, the (ni) elements after the i-th element and the i-th element in the collection must perform the backward/forward one-bit operation.
    ②LinkedList is stored in a linked list, so for the insertion of the add(E e) method, the time complexity of deleting elements is not affected by the element position, which is approximately O(1), if you want to insert and delete elements at the specified position i The time complexity is approximately O(n) because it needs to move to the specified position before inserting.
  • Random access efficiency: ArrayList is more efficient than LinkedList in random access. Because LinkedList is a linear data storage method, it is necessary to move the pointer to search from front to back.
  • Memory space occupation: Each node of LinkedList occupies more memory than ArrayList, because the nodes of LinkedList not only store data, but also store front and back references. The waste of space in ArrayList is mainly reflected in that a certain capacity space is reserved at the end of the list list.

ArrayList is more recommended when the elements in the collection need to be read frequently, and LinkedList is more recommended when there are many insertion and deletion operations.

The expansion mechanism of ArrayList:
When an ArrayList is created with a parameterless construction method, an empty array is actually initialized and assigned. The capacity is only allocated when the element is actually added to the array. That is, when adding the first element to the array, the array capacity is expanded to 10. After that, when expanding the capacity, each time the capacity is expanded by 1.5 times of the original.


How to use ArrayList in multi-threaded scenarios?

ArrayList is not thread-safe. If you encounter a multi-threaded scenario, you can convert it into a thread-safe container through the synchronizedList method of Collections before using it. For example like this:

List<String> synchronizedList = Collections.synchronizedList(list); 

synchronizedList.add("aaa"); 

synchronizedList.add("bbb"); 

for (int i = 0; i < synchronizedList.size(); i++) {
    
     
	System.out.println(synchronizedList.get(i)); 
}

You can also use the thread-safe CopyOnWriteArrayList, whose bottom layer also locks the addition, deletion and modification methods: final ReentrantLock lock = this.lock;


What is the difference between HashMap and Hashtable?

  • Thread safety: HashMap is thread-unsafe, HashTable is thread-safe (internal methods are basically modified by synchronized);
  • Efficiency: HashMap is a little more efficient than HashTable due to thread safety issues
  • Whether the key value is null: HashMap allows the key and the value to be null, but there can only be one null as a key, which is not supported by HashTable;
  • Capacity expansion mechanism: The default container of HashMap is 16, which is 2 times the capacity, and the default value of HashTable is 11, which is 2 times + 1 capacity;
  • The underlying data structure: HashMap after JDK1.8 When the length of the linked list is greater than the threshold (default is 8) (it will be judged before converting the linked list into a red black tree, if the length of the current array is less than 64, then the array will be selected first When expanding, instead of converting to a red-black tree), convert the linked list to a red-black tree to reduce search time. Hashtable has no such mechanism.

Difference between HashMap and HashSet

HashMap HashSet
Implemented the Map interface Implement the Set interface
Store key-value pairs only store objects
Call put() to add an element to the map Call the add() method to add elements to the Set
HashMap uses the key (Key) to calculate the hashcode HashSet uses member objects to calculate hashcode values
HashMap uses unique keys to get objects, which is faster HashSet is slow

How does HashSet check for duplicates?

When you add an object to HashSet, HashSet will first calculate the hashcode value of the object to determine where the object is added, and will also compare it with the hashcode values ​​of other added objects. If there is no matching hashcode, HashSet will assume the object No repeats. But if objects with the same hashcode value are found, the equals() method will be called to check whether objects with equal hashcodes are really the same. If the two are the same, the HashSet will not let the join operation succeed.


How does HashMap work?

Hash algorithm: Hash algorithm refers to the mapping of any length of binary to a smaller binary value of fixed length. This smaller binary value is called a hash
value.

Perturbation function: It is the hash method of HashMap. The use of the hash method, that is, the perturbation function, is to prevent some poorly implemented hashCode() methods. In other words, the use of the perturbation function can reduce collisions. Multiple perturbations can increase the randomness of the low bits of the hash value, making the distribution more uniform, thereby improving the randomness & uniformity of the corresponding array storage subscript positions, and finally reducing Hash conflicts (jdk1.8 perturbed twice, 1 bit Operation + 1 XOR operation has achieved the purpose of participating in the operation at the same time with the high and low bits;)

Before JDK1.8: Before
JDK1.8, the bottom layer of HashMap was the combination of array and linked list , that is, linked list hashing.

  • When we want to put an element, HashMap obtains the hash value through the hashCode of the key after processing by the perturbation function, and then judges the location where the current element is stored by (n - 1) & hash.
  • If there is an element at the current position, it is judged whether the hash value and key of the element and the element to be stored are the same.
  • If they are the same, overwrite them directly. If they are different, put the current key-value in the linked list, and resolve the conflict through the zipper method.
  • When obtaining, directly find the subscript corresponding to the hash value, and then further determine whether the keys are the same, so as to find the corresponding value.

insert image description here

After JDK1.8:
After JDK1.8, the concept of red-black tree was introduced. When the length of the linked list is greater than the threshold (the default is 8) (it will be judged before converting the linked list into a red-black tree, if the length of the current array is less than 64 , then you will choose to expand the array first instead of converting it to a red-black tree), and convert the linked list into a red-black tree to reduce the search time. (Revert to a linked list when the length is less than 6)
insert image description here

The bottom layer of TreeMap, TreeSet and HashMap after JDK1.8 all use red black trees. The red black tree is to solve the defects of the binary search tree, because the binary search tree will degenerate into a linear structure in some cases.

JDK 1.7 JDK 1.8
storage structure array + linked list Array + Linked List + Red Black Tree
Initialization method Separate function: inflateTable() Directly integrated into the expansion function resize()
hash value calculation method Perturbation processing = 9 perturbations = 4 bit operations + 5 XOR operations Perturbation processing = 2 perturbations = 1 bit operation + 1 XOR operation
Rules for storing data When there is no conflict, store the array; when there is a conflict, store the linked list When there is no conflict, store the array; when there is a conflict, judge whether to store the linked list or the red-black tree according to the length of the linked list
insert data Head insertion method (first move the data in the original position to the last 1 bit, and then insert the data to this position) Tail insertion method (insert directly into the tail of the linked list/red-black tree
Calculation method of storage location after expansion All are calculated according to the original method (ie hashCode ->> perturbation function ->> (h&length-1)) Calculated according to the rules after expansion (that is, the position after expansion = original position or original position + old capacity)

What is a red-black tree?

  • A red-black tree is a special binary search tree. Each node of the red-black tree has a storage bit to indicate the color of the node, which can be red (Red) or black (Black).
  • Each node of a red-black tree is either red or black. But anyway his root node is black. Each empty leaf node is also black
  • If a node is red, its children must be black.
  • The number of black nodes passed from each node to the leaf node is the same. (Make sure that no path is twice as long as the others, so the red-black tree is relatively close to a balanced binary tree)

The basic operations of a red-black tree are adding and deleting. The rotation method is used after additions or deletions to the red-black tree. Make sure that the tree is still a red-black tree.

How is the expansion operation of HashMap implemented?

  • In jdk1.8, the resize method is to call the resize method to expand the capacity when the key-value pair in the hashmap is greater than the threshold value or when it is initialized
    ;
  • Every time it is expanded, it is expanded by 2 times;
  • After the expansion, the position of the Node object is either at the original position or moved to a position twice the original offset.

In putVal(), we see that the resize() method is used twice in this function. The resize() method indicates that it will be expanded when it is initialized for the first time, or when the actual size of the array is greater than its size The critical value (12 for the first time), at this time, the elements on the bucket will be redistributed during the expansion. This is also an optimization of the JDK1.8 version. In 1.7, after the expansion, it needs to be re-distributed Calculate its Hash value and distribute it according to the Hash value, but in version 1.8, it is judged according to the position of the same bucket (e.hash & oldCap) is 0 , after re-hash allocation, the element The position of either stays at the original position, or moves to the original position + the increased array size.


Why is the length of HashMap a power of 2?

In order to make HashMap access efficient and minimize collisions, it is necessary to distribute data evenly as much as possible, and each linked list/red-black tree has roughly the same length. This implementation is the algorithm of which linked list/red-black tree to store the data in. We may first think of using % to take the remainder to achieve. However, in the remainder (%) operation, if the divisor is a power of 2, it is equivalent to the AND (&) operation that subtracts one from its divisor (that is, the premise of hash%length==hash&(length-1) is that length is 2 n to the power ;). And the use of binary bit operation & can improve the operation efficiency compared to %, which explains why the length of HashMap is a power of 2.

Implementation principle of ConcurrentHashMap

The bottom layer of ConcurrentHashMap of JDK1.7 is implemented by segment array + linked list. A ConcurrentHashMap contains an array of Segments, and each segment contains an array of HashEntry. HashEntry is used to store key-value pair data. HashEntry A linked list structure is formed between them. Each segment of data is equipped with a ReentranLock lock. When a thread occupies the lock to access one segment of data, other segments of data can also be accessed by other threads.
Element search: secondary hash, the first time to locate the segment field, and the second hash to locate the head of the linked list where the element is located

insert image description here

In JDK1.8, ConcurrentHashMap cancels the Segment segment lock, but adopts Node array (Linked list adopts Node, red-black tree adopts TreeNode) + linked list/red-black tree. The data structure is similar to that of HashMap1.8. Both val and next of Node are decorated with volatile to ensure visibility search and replacement. CAS and synchronized are used to ensure concurrency safety. Synchronized only locks the top node of the current linked list or red-black binary tree, so as long as the hash does not conflict, concurrency will not occur, and the efficiency is improved by N times.
insert image description here


Reference article:
https://gitee.com/SnailClimb/JavaGuide
https://www.bilibili.com/video/BV1Eb4y1R7zd
https://csp1999.blog.csdn.net/article/details/117192375

Guess you like

Origin blog.csdn.net/Lzy410992/article/details/119332937