[Java collection framework interview questions (30 questions)]

Article directory

Java collection framework interview questions (30 questions)

introduction

1. What are the common collections?

Collection-related classes and interfaces are all in java.util, and they are mainly divided into three types: List, Map, and Set.

image-20230816094525819

Among them Collection is the parent interface of collection List, Set which mainly has two sub-interfaces:

  • List: The stored elements are ordered and repeatable.
  • Set: The stored elements are unordered and cannot be repeated.

MapIt is another interface, which is a collection of key-value pair mapping structures.

List

List, there’s nothing to ask, but I don’t rule out the possibility that the interviewer might take a different approach. For example, the interviewer also read this, haha

2.What is the difference between ArrayList and LinkedList?

(1) Different data structures

  • ArrayList is implemented based on arrays
  • LinkedList is implemented based on doubly linked list

image-20230816094804247

(2) In most cases, ArrayList is more conducive to searching, and LinkedList is more conducive to additions and deletions.

  • ArrayList is implemented based on arrays. get(int index) can be obtained directly through the array subscript, and the time complexity is O(1);
  • LinkedList is implemented based on linked lists. Get(int index) needs to traverse the linked list, and the time complexity is O(n); of course, for get(E element), both collections need to be traversed, and the time complexity is O(n). .
  • If the addition or deletion of ArrayList is at the end of the array, you can directly insert or delete it. However, if you insert the middle position, you need to move the elements after the insertion position forward or backward, and it may even trigger expansion;
  • Inserting and deleting doubly linked lists only requires changing the pointing of the predecessor node, successor node and inserted node, without moving elements.

image-20230816095135640

image-20230816095225520

Note that there may be a trap here. LinkedList is more conducive to additions and deletions, which is reflected in the average step size, not in time complexity. The time complexity of both additions and deletions is O(n)

(3) Whether to support random access

  • ArrayList is based on an array, so it can be searched based on subscripts and supports random access . Of course, it also implements the RandmoAccess interface. This interface is only used to identify whether random access is supported.
  • LinkedList is based on a linked list, so it cannot directly obtain elements based on serial numbers. It does not implement the RandmoAccess interface, and tags do not support random access .

(4) Memory occupancy . ArrayList is based on arrays and is a continuous memory space. LinkedList is based on linked lists and the memory space is discontinuous. They have some additional consumption in space occupancy :

  • ArrayList is a predefined array, there may be empty memory space, and there is a certain amount of space waste.
  • Each node of LinkedList needs to store the predecessor and successor, so each node will take up more space.

3.Do you understand the expansion mechanism of ArrayList?

ArrayList is a collection based on arrays. The capacity of the array is determined when it is defined. If the array is full and you insert it again, the array will overflow. Therefore, when inserting, it will first check whether the capacity needs to be expanded . If the current capacity + 1 exceeds the array length, the capacity will be expanded.

The expansion of ArrayList is to create a new array 1.5 times larger and then copy the values ​​​​of the original array.

image-20230816095727323

4.Do you know how to serialize ArrayList? Why use transient to modify arrays?

The serialization of ArrayList is different. It uses a transientmodified array to store elements elementData. transientThe function of the keyword is to prevent the modified member attributes from being serialized .

Why don't most ArrayLists serialize the element array directly?

For efficiency reasons, the array may have a length of 100, but only 50 is actually used. The remaining 50 are not used and do not need to be serialized. This can improve the efficiency of serialization and deserialization and save memory space.

So how to serialize ArrayList?

ArrayList customizes the serialization and deserialization strategies through two methods readObject, and actually directly uses the sum of two streams for serialization and deserialization.writeObjectObjectOutputStreamObjectInputStream

/**
*	自定义序列化
*/
private void writeObject(java.io.ObjectoutputStream s) throws java.io.IOException{
    
    
    //fail-fast,后续判断是否有并发处理
    int expectedModCount = modCount;
    //序列化没有标记为 static、transient 的字段,包括 size 等.
    s.defaultwriteObject();
    
    s.writeInt(size);
   
    //序列化数组的前size个元素
    for(int i=0; i < size; i++){
    
    
    	s.writeObject(elementData[i]);
    }
    if(modCount != expectedModCount){
    
    
        throw new ConcurrentModificationException();
	}
}

/*
*	自定义反序列化
*/
private void readobject(java.io.ObjectInputStream s) throws java.io.IOException, ClassNotFoundException {
    
    
	elementData = EMPTY_ELEMENTDATA;
    
	//反序列化没有标记为 static、transient 的字段,包括 size 等
    s.defaultReadobject();
    
    s.readInt();//ignored 
    
    if(size > 0){
    
    
    	//数组扩容
    	int capacity = calculateCapacity(elementData, size);
        SharedSecrets.getJavaOISAccess().checkArray(s,Object[].class,capacity);
        ensureCapacityInternal(size);
        
        Object[]a = elementData;
    	//反序列化元素并填充到数组中
        for(int i=0;i<size;i+){
    
    
        	a[i]= s.readObject();
        }
    }
}

5. Do you understand fail-fast and fail-safe?

Fail-fast : Fail-fast is an error detection mechanism for Java collections

  • When using an iterator to traverse a collection object, if thread B modifies (adds, deletes, or modifies) the contents of the collection object during the traversal process of thread A, a Concurrent Modification Exception will be thrown.
  • modCountPrinciple: The iterator directly accesses the contents of the collection when traversing, and uses a variable during the traversal process . A value that changes if the contents of the collection change while it is being traversed modCount. Whenever the iterator uses hashNext()/next() to traverse the next element, it will check whether the modCount variable is the expectedmodCount value. If so, it will return to the traversal; otherwise, an exception will be thrown and the traversal will be terminated.
  • Note: The throwing condition of the exception here is to detect the condition modCount != expectedmodCount. If the modCount value is modified when the collection changes and it happens to be set to the expectedmodCount value, the exception will not be thrown. Therefore, programming for concurrent operations cannot rely on whether this exception is thrown . This exception is only recommended for detecting bugs in concurrent modifications.
  • Scenario: The collection classes under the java.util package are all failure-fast and cannot be modified concurrently under multi-threads (modified during the iteration process), such as the ArrayList class.

fail-safe

  • Collection containers that use a fail-safe mechanism do not directly access the collection content during traversal. Instead, they first copy the original collection content and traverse the copied collection.
  • Principle: Since a copy of the original collection is traversed during iteration, modifications to the original collection during the traversal process cannot be detected by the iterator, so Concurrent Modification Exception will not be triggered.
  • Disadvantages: The advantage of copying content is that it avoids Concurrent Modification Exception, but similarly, the iterator cannot access the modified content, that is: the iterator traverses the collection copy obtained at the moment when it starts traversing. During the traversal, the original Modifications that occur to the collection are unknown to the iterator.
  • Scenario: Containers under the java.util.concurrent package are all fail-safe and can be used concurrently and modified in multiple threads, such as the CopyOnWriteArrayList class.

6. What are some ArrayListways to achieve thread safety?

Fail-fast is a mechanism that may be triggered. In fact, ArrayListthread safety is still not guaranteed. Generally, guaranteed ArrayListthread safety can be achieved through these solutions:

  • Use Vector instead ArrayList. (Not recommended, Vectorit is a historical legacy category)
  • Use Collections.synchronizedListthe package ArrayListand then operate the packaged one list.
  • Use CopyOnWriteArrayList instead ArrayList.
  • When in use ArrayList , the application controls ArrayList the reading and writing through the synchronization mechanism.

7.How much do you know about CopyOnWriteArrayList?

CopyOnWriteArrayListIt's the thread-safe version ArrayList.

Its name is copyOnwritecopy-on-write, which already makes its principle clear.

CopyOnWriteArrayListA concurrency strategy that separates reading and writing is adopted . CopyOnWriteArrayListThe container allows concurrent reading, and the reading operation is lock-free and has high performance. As for write operations, such as adding an element to a container, first make a copy of the current container, then perform the write operation on the new copy, and then point the reference of the original container to the new container.

image-20230816101501260

Map

Among Maps, there is no doubt that the most important thing is HashMap. Interviews are basically full of questions. You must be well prepared for various questions.

8. Can you tell me about the data structure of HashMap?

The data structure of JDK1.7 is ** 数组+ 链表**. Is anyone still using JDK1.7? No way…

Let’s talk about the data structure of JDK1.8:

The data structure of JDK1.8 is ** 数组+ 链表+ 红黑树. **

The data structure diagram is as follows:

image-20230816102139997

Among them, the bucket array is used to store data elements, the linked list is used to resolve conflicts, and the red-black tree is used to improve query efficiency .

  • The data elements are mapped to the position of the corresponding index in the bucket array through the mapping relationship, that is, the hash function.
  • If a conflict occurs, pull a linked list from the conflicting location and insert the conflicting element.
  • If the linked list length>8 & array size>=64, the linked list will be converted into a red-black tree
  • If the number of red-black tree nodes is <6, convert it to a linked list

9. How much do you know about red-black trees? Why not use binary tree/balanced tree?

The red-black tree is essentially a binary search tree. In order to maintain balance, it adds some rules based on the binary search tree:

  1. Each node is either red or black;
  2. The root node is always black;
  3. All leaf nodes are black (note that the leaf nodes here are actually NULL nodes in the graph);
  4. The two child nodes of each red node must be black;
  5. The path from any node to every leaf node in its subtree contains the same number of black nodes;

image-20230816102559732

The reason why binary trees are not used:

A red-black tree is a balanced binary tree. The worst-case time complexity of insertion, deletion, and search is O(logn), which avoids the worst-case O(n) time complexity of a binary tree.

Why not balance binary trees:

A balanced binary tree is a more strictly balanced tree than a red-black tree. In order to maintain balance, more rotations are required. This means that a balanced binary tree is less efficient at maintaining balance , so the insertion and deletion efficiency of a balanced binary tree is higher than that of a red-black tree. Low.

10. Do you know how the red-black tree maintains balance?

Red-black trees maintain balance in two ways: 旋转and染色 .

  • Rotation: There are two types of rotation, left-hand and right-hand

image-20230816102943846

image-20230816103005222

  • Dyeing: changing the color of red and black

image-20230816103054629

11.Do you know the put process of HashMap?

Let’s start with a flow chart:

image-20230816103249967

  1. First, perturb the hash value and obtain a new hash value.(key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);

  2. Determine whether the tab is empty or has a length of 0, and if so, perform expansion operations.

    if((tab = table) == null || (n = tab.length) == 0)
    	n =  (tab = resize()).length;
    
  3. Calculate the subscript based on the hash value. If the corresponding subscript does not store data, you can insert it directly, otherwise it needs to be overwritten.tab[i=(n-1) & hash])

  4. Determine whether tab[i] is a tree node, otherwise insert data into the linked list, if so, insert the node into the tree.

  5. If the length of the linked list is greater than or equal to 8 when inserting a node into the linked list, you need to convert the linked list into a red-black tree.treeifyBin(tab, hash);

  6. After all elements are processed, it is judged whether it exceeds the threshold; thresholdif it exceeds, the capacity will be expanded.

12.How does HashMap find elements?

Let’s look at the flow chart first:

image-20230816104604035

HashMap search is much simpler:

  1. Use the perturbation function to obtain a new hash value
  2. Calculate array subscript and obtain node
  3. If the current node matches the key, it will be returned directly.
  4. Otherwise, whether the current node is a tree node, search for a red-black tree
  5. Otherwise, traverse the linked list to find

13. How is the hash/perturbation function of HashMap designed?

The hash function of HashMap first gets the hashCode of the key, which is a 32-bit int type value, and then performs an XOR operation on the high 16 bits and low 16 bits of the hashCode .

The function of hashCode() is to obtain the hash code, also called hash code; it actually returns an int integer, which is defined in the Object class and is a local method. This method is usually used to convert the memory address of the object into Return after the integer .

static final int hash(Object key) {
    
    
    int h;
    // key的hashCode和key的hashCode右移16位做异或运算
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

This design is to reduce the probability of hash collision.

14. Why can hash/perturbation functions reduce hash collisions?

Because the key.hashCode() function calls the hash function that comes with the key value type, it returns an int type hash value. The int value range is -2147483648-2147483647, which adds up to about 4 billion mapping spaces.

As long as the hash function is mapped relatively evenly and loosely, collisions are unlikely to occur in general applications. But the problem is that an array with a length of 4 billion cannot fit in the memory.

If the initial size of the HashMap array is only 16, you need to use the modulo operation on the length of the array before the remainder can be used to access the array subscript.

The modular operation in the source code is to perform a " 与&" operation on the hash value and **array length-1**. The bit operation is faster than the remainder % operation.

bucketIndex = indexFor(hash, table.length);

static int indexFor(int h, int length) {
    
    
	return h & (length-1);
}

By the way, this also explains why the array length of HashMap must be an integer power of 2.

Because this (array length - 1) is exactly equivalent to a "low bit mask". The result of the operation is that all the high bits of the hash value are reset to zero, and only the low bits are retained for array subscript access.

Taking the initial length of 16 as an example, 16-1=15. The binary representation is yes 0000 0000 0000 0000 0000 0000 0000 1111. The operation with a certain hash value is as follows. The result is to intercept the lowest four-digit value.

image-20230816110110457

This is faster, but a new problem comes. Even if the distribution of hash values ​​is loose, if only the last few are taken, the collision will be serious. If the hashing itself is not done well and the loopholes are distributed in an arithmetic sequence, it will be even more difficult if the last few low bits are repeated regularly.

At this time , the value of the perturbation function is reflected. Take a look at the schematic diagram of the perturbation function:

image-20230816110304448

Shift right by 16 bits, which is exactly half of 32 bits. XOR the high and low half of your area to mix the high and low bits of the original hash code to increase the randomness of the low bits. Moreover, the mixed low bits are doped with some features of the high bits, so that the information of the high bits is also retained in disguise.

15.Why is the capacity of HashMap a multiple of 2?

  • The first reason is to facilitate the hash remainder :

When placing elements on the table array, the hash value % of the array size is used to locate the position, while HashMap uses hash value & (array size - 1), but it can achieve the same effect as before. This is due to the fact that the size of HashMap is Multiples of 2, multiples of 2 means that only one binary digit of the number is 1, and the number -1 can be obtained by changing the binary digits from 1 to 0, and the following 0 to 1, and then through the & operation, you can Get the same effect as %, and bit operations are much more efficient than %

When the capacity of HashMap is the n power of 2, the binary system of (n-1) is in the form of 1111111***111. In this way, when performing bit operations with the hash value of the added element, it can fully hash, so that The added elements are evenly distributed in each position of the HashMap, reducing hash collisions.

  • The second aspect is that during expansion, the expanded size is also a multiple of 2 to perfectly transfer elements that have had hash collisions to the new table .

We can briefly look at the expansion mechanism of HashMap. The elements in HashMap 负载因子*HashMapwill expand when they exceed the size.

++modCount;
if (++size > threshold)
    resize();
afterNodeInsertion(evict);
return null;

When put, when the size exceeds the threshold, it will be expanded.

16. If you initialize HashMap and pass a value of 17 new HashMap<>, how will it be processed?

To put it simply, during initialization, if the value passed is not a multiple of 2, HashMap will look upward for the nearest multiple of 2, so 17 is passed in, but the actual capacity of HashMap is 32.

Let’s take a look at the details. In the initialization of HashMap, there is such a method;

public HashMap(int initialCapacity, float loadFactor) {
    
    
    ...
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}
  • The threshold value is tableSizeForcalculated through the method and is calculated based on the parameters passed during initialization.
  • At the same time, this method also needs to find the smallest binary value that is larger than the initial value. For example, if 17 is passed, what I should find is 32.
static final int tableSizeFor(int cap) {
    
    
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; 
}
  • MAXIMUM_CAPACITY = 1 << 30, this is the critical range, which is the largest Map collection.
  • The calculation process is to shift 1, 2, 4, 8, 16 to the right, and perform operations with the original number |. This is mainly to fill in each position of the binary system with 1. When each position of the binary system is 1, it is a The standard multiple of 2 is reduced by 1, and finally the result is added by 1 and returned.

Taking 17 as an example, let’s take a look at the process of initializing the table capacity:

image-20230816111520035

17. What other construction methods of hash functions do you know?

The method of the hash constructor in HashMap is called:

  • Except for the remainder method : H(key)=key%p (p<=N), the keyword is divided by a positive integer p that is not larger than the length of the hash table, and the remainder is the address. Of course, HashMap has been optimized and transformed. More efficient and more balanced hashing.

In addition, there are several common hash function construction methods:

  • Direct addressing method
    directly maps to the corresponding array position according to the key, for example, 1232 is placed at the subscript 1232 position.
  • Numeric analysis method:
    take certain numbers of the key (such as tens and hundreds) as the mapping position
  • The square-middle method
    takes the middle digits of the key square as the mapping position.
  • The folding method
    divides the key into several segments with the same number of digits, and then uses their superposition sum as the mapping position.

image-20230816112106831

18.What are the methods to resolve hash conflicts?

We now know that the reason why HashMap uses a linked list is to deal with hash conflicts. This method is so-called:

  • Chain address method : Pull a linked list at the conflicting location and put the conflicting elements into it.

In addition, there are some common conflict resolution methods:

  • Open addressing method : The open addressing method starts from the conflicting position and then searches downwards to find an empty space for the conflicting element.

    There are many ways to find free slots:

    • Line exploration method: starting from the conflicting position, determine whether the next position is free until the free position is found
    • Square exploration method: Starting from the conflicting position x, add 1^2 positions for the first time, increase 2^2... for the second time, until a free position is found

    image-20230816112439882

  • Rehashing : Change the hash function and recalculate the address of the conflicting element.

  • Establish a public overflow area : Create another array and put the conflicting elements into it.

19. Why is the threshold for converting a HashMap linked list into a red-black tree 8?

Treeing occurs when the length of the table array is greater than 64 and the length of the linked list is greater than 8.

Why is it 8? The source code comments also give the answer.

image-20230816112540351

The size of red-black tree nodes is about twice the size of ordinary nodes, so converting to red-black tree sacrifices space for time . It is more of a cover-up strategy to ensure search efficiency in extreme situations.

Why should we choose 8 as the threshold? related to statistics. Ideally, using random hash codes, the nodes in the linked list conform to the Poisson distribution , and the probability of the number of nodes appearing is decreasing. When the number of nodes is 8, the probability of occurrence is only 0.00000006 .

As for the threshold for converting a red-black tree back to a linked list, why is it 6 instead of 8? This is because if this threshold is also set to 8, if a collision occurs and the node increase or decrease is just around 8, there will be continuous conversion between the linked list and the red-black tree, resulting in a waste of resources.

20. When will the capacity be expanded? Why is the expansion factor 0.75?

In order to reduce the probability of hash conflicts , when the number of elements in the current HashMap reaches a critical value, expansion will be triggered, and all elements will be rehashed and then placed in the expanded container. This is a very time-consuming operation.

This critical value threshold is determined by the loading factor and the capacity of the current container. If the default construction method is used:

Threshold = Default capacity (DEFAULT_INITIAL_CAPACITY) * Default expansion factor (DEFAULT_LOAD_FACTOR)

image-20230816162710806

That is, when it is greater than 16x0.75=12, the expansion operation will be triggered.

So why was 0.75 chosen as the default loading factor of HashMap?

Simply put, this is a consideration of 空间cost and 时间cost balance.

There is such a comment in HashMap:

As a general rule, the default load factor (.75) provides a good compromise between time and space costs . Higher values ​​reduce space overhead but increase lookup costs (reflected in most operations of the HashMap class, including get and put). When setting its initial capacity, the expected number of entries in the map and its load factor should be considered to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehashing occurs.

image-20230816162821528

We all know that the hash construction method of HashMap is Hash remainder, and the load factor determines when the number of elements reaches the expansion.

If we set it larger, have more elements, and expand the capacity when there are fewer vacancies, then the probability of a hash conflict will increase, and the time cost of searching will increase.

If we set it smaller, there will be fewer elements, and when there are more vacancies, the capacity will be expanded. The probability of hash collision will be reduced, and the search time cost will be reduced. However, more space will be needed to store elements, and the space cost will increase. .

21. Do you understand the expansion mechanism?

HashMap is implemented based on array + linked list and red-black tree, but the length of the bucket array used to store key values ​​is fixed and determined by the initialization parameters.

Then, as the number of data inserted increases and the load factor increases, the capacity needs to be expanded to store more data. A very important point in the expansion is that the optimization operation in jdk1.8 eliminates the need to recalculate the hash value of each element .

Because the initial capacity of HashMap is the power of 2 , the expanded length is twice the original, and the new capacity is also the power of 2. Therefore, the elements are either at the original position or moved to the original position by the power of 2 .

Look at this picture. n is the length of the table. Figure a shows that key1 and key2 determine the index position before expansion. Figure b shows that key1 and key2 determine the index position after expansion.

image-20230816163544564

After the hash of the element is recalculated, because n becomes 2 times, the mask range of n-1 is 1 bit more high (red), so the new index will change like this:

image-20230816163639048

So when expanding, you only need to check whether the new digit of the original hash value is 0 or 1. If it is 0, the index has not changed. If it is 1, it becomes ** + **. See if the 16 原索引expansion oldCapis Schematic diagram of 32:

image-20230816163753016

The main logic of expansion node migration:

if (e.next == null)
    newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
    //树节点分割
    ((TreeNode<K,V)e).split(this, newTab, j, oldCap);
else{
    
     //preserve order
    //链表拆成两个链表lo、hi,两个链表的头、尾节点
    Node<K,V> loHead = null, loTail = null;
    Node<K,V> hiHead = null, hiTail = null;
    Node<K,V> next;
    do {
    
    
        //遍历链表,将节点放到相应的新链表
        next = e.next;
        if ((e.hash & oldCap) == 0) {
    
    
            if (loTail == null)
                loHead = e;
            else 
                loTail.next = e;
            loTail = e;
        }
        else {
    
    
            if (hiTail == null)
                hiHead = e;
            else
                hiTail.next = e;
            hiTail = e;
        }
    } while ((e = next) != null);
    //lo链表放到新table原位置
    if (loTail != null) {
    
    
        loTail.next = null;
        newTab[j] = loHead;
    }
    //hi链表放到j+oldCap位置
    if (hiTail != null) {
    
    
        hiTail.next = null;
        newTab[j + oldCap] = hiHead;
    }
}

22.What main optimizations has jdk1.8 made to HashMap? Why?

The HashMap of jdk1.8 has five main optimization points :

  1. Data structure : array + linked list changed to array + linked list or red-black tree

    Reason: When a hash conflict occurs, the elements will be stored in the linked list. If the linked list is too long, it will be converted into a red-black tree, reducing the time complexity from O(n) to O(logn).

  2. Linked list insertion method : The insertion method of linked list is changed from head insertion method to tail insertion method.

    To put it simply, when inserting, if there is already an element at the array position, 1.7 puts the new element into the array, and the original node is used as the successor node of the new node. 1.8 traverses the linked list and places the element at the end of the linked list.

    Reason: Because when the 1.7 head plugging method is expanded, the head plugging method will invert the linked list, and a loop will occur in a multi-threaded environment.

  3. Expansion rehash : When expanding, 1.7 needs to rehash the elements in the original array and position them in the new array. 1.8 adopts simpler judgment logic and does not need to recalculate the position through the hash function. The new position remains unchanged or the index + Added capacity size .

    Reason: Improve the efficiency of expansion and expand capacity faster.

  4. Expansion timing : When inserting, 1.7 first determines whether expansion is needed before inserting, 1.8 is inserted first, and then determines whether expansion is needed after the insertion is completed ;

  5. Hash function : 1.7 does four shifts and four XORs, jdk1.8 only does it once.

    Reason: If you do it 4 times, the marginal utility will not be great. Change it to once to improve efficiency.

23. Can you design and implement a HashMap yourself?

This question is often tested by fast players.
Don't panic, we probably won't be able to write the red-black tree version, but the array + linked list version is still not a big problem. For details, you can see: Handwritten HashMap, the Kuaishou interviewer calls it an expert! .

Overall design:

  • Hash function: hashCode() + division leaving remainder method
  • Conflict resolution: chain address method
  • Expansion: Node re-hashes to obtain location

image-20230816165719831

Complete code:

picture

24.Is HashMap thread-safe? What are the problems with multi-threading?

HashMap is not thread-safe and these problems may occur:

  • Expansion infinite loop under multi-threading. HashMap in JDK1.7 uses the head insertion method to insert elements. In a multi-threaded environment, expansion may lead to the emergence of a circular linked list, forming an infinite loop. Therefore, JDK1.8 uses the tail insertion method to insert elements, which will maintain the original order of the linked list elements during expansion, and will not cause the problem of circular linked lists.
  • Multi-threaded put may cause elements to be lost. Multiple threads perform put operations at the same time. If the calculated index positions are the same, it will cause the previous key to be overwritten by the next key, resulting in the loss of elements. This problem exists in both JDK 1.7 and JDK 1.8.
  • When put and get are executed concurrently, get may be null. When thread 1 executes put, rehash occurs because the number of elements exceeds the threshold. Thread 2 executes get at this time, which may cause this problem. This problem exists in both JDK 1.7 and JDK 1.8.

25. Is there any way to solve the problem of HashMap thread insecurity?

There are HashTable, Collections.synchronizedMap, and ConcurrentHashMap in Java to implement thread-safe Maps.

  • HashTable directly adds the synchronized keyword to the operation method to lock the entire table array, and the granularity is relatively large;
  • Collections.synchronizedMap is an internal class that uses the Collections collection tool. It encapsulates a SynchronizedMap object by passing in the Map. It defines an object lock internally and implements it through the object lock in the method;
  • ConcurrentHashMap uses segmented locks in jdk1.7 and CAS+synchronized in jdk1.8.

26. Can you talk about the implementation of ConcurrentHashmap in detail?

ConcurrentHashmap thread safety is based on implementation in jdk1.7 version 分段锁, and is based on implementation in jdk1.8 CAs+synchronized.

1.7 Segment lock

Structurally speaking, the 1.7 version of ConcurrentHashMap adopts a segment lock mechanism, which contains a Segment array. Segment inherits from ReentrantLock, and Segment contains an array of HashEntry. HashEntry itself is a linked list structure, with the ability to save keys and values ​​and can point to Pointer to the next node.

In fact, it is equivalent to each Segment being a HashMap. The default Segment length is 16, which means that it supports concurrent writing of 16 threads, and Segments will not be affected by each other.

image-20230816170618870

put process

The whole process is very similar to HashMap, except that the specific Segment is first located and then operated through ReentrantLock. The subsequent process is basically the same as HashMap.

  1. Calculate the hash and locate the segment. If the segment is empty, initialize it first.
  2. Use ReentrantLock to lock. If the lock acquisition fails, try to spin. If the spin exceeds the number of times, the acquisition will be blocked, ensuring that the lock acquisition is successful.
  3. Traversing HashEntry is the same as HashMap. If the key and hash in the array are the same, they will be directly replaced. If they do not exist, they will be inserted into the linked list. The same operation is performed on the linked list.

image-20230816170850619

get process

Get is also very simple. The key is located in the segment through hash, and then the linked list is traversed to locate the specific element. It should be noted that the value is volatile, so get does not require locking.

1.8 CAS+synchronized

The full name of CAS is Compare-And-Swap, which literally means comparison and exchange. It is an atomic instruction of the CPU. Its function is to let the CPU first compare whether two values ​​​​are equal, and then atomically update the value of a certain position.

The CAS operation requires the input of two values, an old value (the expected value before the operation) and a new value. During the operation, first compare whether the old value has changed. If there is no change, then exchange it to the new value. What happened There is no exchange of changes.

The implementation of thread safety in jdk1.8 does not focus on the data structure. Its data structure is the same as HashMap, array + linked list + red-black tree. The key to achieving thread safety lies in the put process.

image-20230816171443529

put process

  1. First calculate the hash and traverse the node array. If the node is empty, initialize it through CAS+spin.
  2. If the current array position is empty, write data directly through CAS spin.
  3. If hash==MOVED, it means that expansion is needed and the expansion is performed.
  4. If neither is satisfied, use synchronized to write data. When writing data, the linked list and red-black tree are also judged. The linked list writing method is the same as HashMap. If the key hash is the same, it will be overwritten. Otherwise, tail insertion will be used. If the length of the linked list exceeds 8, it will be converted. red-black tree

get query

get is very simple, basically the same as HashMap. It calculates the position by key. If the key is the same in the table position, it will be returned. If it is a red-black tree, it will be obtained according to the red-black tree. Otherwise, it will be traversed through the linked list.

27.Are the internal nodes of HashMap ordered?

HashMap is unordered and randomly inserted based on hash value. If you want to use an ordered Map , you can use LinkedHashMap or TreeMap .

28. Tell me how LinkedHashMap achieves ordering?

LinkedHashMap maintains a two-way linked list with head and tail nodes. At the same time, in addition to inheriting the Node attribute of HashMap, the LinkedHashMap node Entry also has before and after used to identify the front node and the back node.

image-20230816172735692

Sorting can be done in insertion order or access order.

image-20230816172829071

29. Tell me how to achieve ordering in TreeMap?

TreeMap sorts according to the natural order of Key or the order of Comprator , which is implemented internally through red-black trees. So either the class to which the key belongs implements the Comparable interface, or you can customize a comparator that implements the Comparator interface and pass it to TreeMap for key comparison.

image-20230816173012183

Set

There is nothing to ask in the Set interview. Use HashSet to make up the numbers.

30. Tell me about the underlying implementation of HashSet?

The bottom layer of HashSet is implemented based on HashMap. (The source code of HashSet is very, very small, because except for clone(), writeObject(), and readObject(), which HashSet itself has to implement, other methods directly call the methods in HashMap.

The add method of HashSet directly calls the put method of HashMap , using the added element as the key and a new Object as the value. It directly calls the put method of HashMap. It will determine whether the element is successfully inserted based on whether the return value is empty.

public boolean add(E e) {
    
    
	return map.put(e, PRESENT)==null;
}

Source address: Noodle Counterattack: Java Collection of Thirty Questions

Guess you like

Origin blog.csdn.net/weixin_45483322/article/details/132326115