HashMap related interview questions

The realization principle of HashMap

The bottom layer of HashMap is realized by array + linked list. An Entry array stores key-value pairs, and each key-value pair is an Entry entity. The Entry class is actually a singly linked list. In jdk1.8, when the length of the linked list is greater than 8, the linked list is converted to a red-black tree.
Insert picture description here

The role of arrays and linked lists in HashMap

  1. The array is used to determine the position of the bucket. The calculation method for obtaining the position of the bucket is to modulo the length of the array by the hash value of the key of the element.
  2. The linked list solves the problem of hash conflict. When the hash value is the same, a linked list is formed on the array, and only the header interpolation method is used. What is the head insertion method? The one that comes in is inserted directly into the head. why? Because the founder feels that those who come in later are more likely to be visited.Insert picture description here

Solution to hash conflict

  1. Open addressing
  2. Chain address
  3. Rehashing
  4. Public spillover zone law

Can Linkedlist be used instead of array?

Yes. But in the case of getting the hash value, it is faster to determine the location of the bucket through the array.

HashMap expansion conditions

If the bucket exceeds, it load factor * current capacity,must be resized
load fator = 0.75, in order to avoid hash collisions as much as possible.
current capacityIs the current array size

Why the expansion is a power of 2

  1. In order to store and store efficiently, HashMap should have as few collisions as possible. It is to distribute the data as evenly as possible. The length of each linked list is roughly the same. This implementation is based on the algorithm of which linked list is stored in the data; this algorithm is actually modulo ,hash%length.
    However, this operation is not as fast as the shift operation.
    Therefore, the source code is optimized hash&(length-1).
    In other words hash%length==hash&(length-1)
    , the guarantee that the volume is 2 to the n-th power is to ensure that (length-1)everybody can do it &1.
    Insert picture description here

HashMap put element process

  1. The hashCode() of the key is processed by the disturbance function to obtain the hash value, and then the location where the current element is stored is determined by (n-1)&hash.
  2. If there is no collision, put it directly into the bucket
  3. If there is a collision, judge whether the hash value and key of the element and the element to be stored are the same. If they are the same, update the value to ensure the uniqueness of the key. If they are different, use the zipper method to resolve the conflict. After storing in the bucket in the form of a linked list.
  4. If there are 8 elements in the linked list, it turns to red and black.
  5. Resize if the bucket is full

HashMap get element process

  1. Perform hash operation according to key's hashCode() and calculate index
  2. If the first node in the bucket is hit, return; if there is a conflict, use key.equal(k) to find the corresponding Entry.

Do you know which hash algorithms

  1. Hash function refers to mapping a large range to a small range. The purpose of mapping a large area to a small area is often to save space and make data easy to save.
  2. The more famous ones are MurmurHash, MD4, MD5

Talk about the implementation of String hashcode

Insert picture description here
It can be seen that the calculation idea of ​​String's hashCode() is: take 31 as the weight, and each bit is the ASCII value of the character for calculation, and use natural overflow to get the equivalent value.
The hash calculation formula can be counted as s[0]31^(n-1) + s[1]31^(n-2) +… + s[n-1]
Then why is 31 the prime number?
Mainly because 31 is an odd prime number, so 31* i=32* ii=(i<<5)-i, this kind of calculation combined with displacement and subtraction is much faster than general calculations.

Improvements of JDK1.8 to HashMap

  1. Added that when the length of the linked list is greater than 8, the linked list is converted to a red-black tree.
  2. Optimized the high-order operation hash algorithm: h^(h>>>16)
  3. After the expansion, the element is either in the original position or moved to the power of 2 in the original position, and the order of the linked list does not change. Solved the problem of HashMap infinite loop

Why not directly use red-black trees when resolving hash conflicts? Instead, choose to use linked lists first, then red-black trees?

Because red-black trees need to perform left-handed, right-handed, and color-changing operations to maintain balance, singly linked lists do not.
When the number of elements is less than 8, when the query operation is performed at this time, the linked list structure can already guarantee the query performance. When there are more than 8 elements, a red-black tree is needed to speed up the query, but the efficiency of adding new nodes slows down.
Therefore, if the red-black tree structure is used at the beginning, there are too few elements, and the new efficiency is relatively slow, which is undoubtedly a waste of performance.

Can a binary search tree be used to replace the red-black tree?

Yes, but the binary search tree may degenerate into a linear structure. Traversing the search will become very slow.

When the linked list turns into a red-black tree, when will it degenerate into a linked list?

For 6the time to return to a linked list. There is a difference of 7 in the middle to prevent frequent conversion between the linked list and the tree.

What are the problems with HashMap in a concurrent programming environment?

  1. (Problems before 1.8) Multi-threaded expansion caused infinite loop problems. When concurrent, Rehash will cause the elements to form a circular linked list.
  2. Elements may be lost when put in multiple threads
  3. After putting a non-null element, the get out is null

How to solve concurrency issues

Use concurrentHashMap, a thread-safe collection class. Note: Although Hashtable is thread-safe, it is actually deprecated and not used much.

Can key be a null value?

can. When the key is null, the final value of the hash algorithm is calculated as 0. Put it at the first position of the array.
Insert picture description here

What is generally used as the key of HashMap

Generally, immutable classes such as Integer and String are used as the keys of HashMap. String is the most commonly used.

  1. Because the string is immutable, the hashcode is cached when it is created and does not need to be recalculated. This makes the string very suitable as a key in the Map, and the processing speed of the string is faster than other key objects. This is that the keys in HashMap often use strings.
  2. Because equals() and hashCode() methods are used when obtaining objects, it is very important for the key object to rewrite these two methods correctly. These classes have already overwritten the hashCode() and equals() methods. .

What is the problem when using a variable class as the key of HashMap?

The hashCode may change, resulting in not being able to get.

How can you implement a custom class as the key of HashMap?

Two important test points:

  • 重写hashcode和equals方法
  • How to design an immutable class
  1. The principle of the first question:
    (1) Two objects are equal, hashcode must be equal
    (2) Two objects are not equal, hashcode is not necessarily equal
    (3) Hashcode is equal, two objects are not necessarily equal
    (4) Hashcode is not equal , The two objects must be unequal
  2. The design principle of the second question:
    (1) Add final modifier to the class to ensure that the class is not inherited.
    (2) Ensure that all member variables must be private, and add final modification.
    (3) Do not provide methods to change member variables, including setters.
    (4) Initialize all members through the constructor and perform deep copy
    initialization methods:
    Insert picture description here
    This method cannot guarantee immutability. MyArray and array point to the same memory address. Users can change the value of myArray by modifying the value of the array object outside of ImmutableDemo.
    The correct way (deep copy):
    Insert picture description here
    (5) In the getter method, do not directly return the object itself, but clone the object and return a copy of the object.
    This approach is to prevent the object from leaking and prevent the internal variable member object from being obtained through the getter Afterwards, the member variables are directly manipulated, causing the member variables to change.
    The String type overloads hashCode() to return the HashCode value based on the content of the string, so strings with the same content have the same Hash Code

The difference between HashMap and Hashtable

  1. Whether the thread is safe: The thread of HashMap is not safe. Solution: Use ConcurrentHashMap. Hashtable is thread-safe, and all methods are synchronizedmodified.
  2. Efficiency: Since Hashtable is thread-safe, the efficiency is lower than HashMap. And Hashtable is basically not used anymore.
  3. Null key problem: In HashMap, there can be one null and it is the first in the array, and multiple values ​​can be null. The Hashtable cannot use null as a null value.
  4. The initial capacity and size are different:
  • The initial default capacity of HashMap is 16, and each time it is expanded, its expansion is a power of 2. Even if the size of the HashMap is given, it will be expanded to a power of two. (The underlying tableSizeformethod of HashMap guarantees its expansion mechanism).
    As for why it is a power of 2, it is to reduce collisions and make the access efficient. HashMap also needs to modulate the length of the array (index = hash%length) before using it, and the remainder is used to store the subscript of the corresponding array. The calculation method in the source code is written as: hash&(length-1) 。 也就是说hash%length==hash&(length-1) 所以,保证容积是2的n次方,是为了保证在做(length-1) 的时候,每一位都能&1`
    (It was originally written on it, but I am writing it once to deepen my impression)

The infinite loop problem caused by HashMap multi-threaded operation

We all know that when the value in the bucket is greater than load factor * current size, a resize is required. (resize是Rehash中的一个步骤,Rehash包括resize方法和transfer方法).
Rehash under concurrency will cause a circular linked list between elements. However, jdk 1.8 后解决了这个问题it is still not recommended to use HashMap in multi-threaded mode, because using HashMap in multi-threaded mode still has other problems such as data loss. ConcurrentHashMap is recommended in a concurrent environment.

In fact, I am a little confused. Does jdk1.8 solve this problem? How to solve it? Did the official say it? What is changed in 1.8 is that if the length of the linked list exceeds 8, it becomes a red-black tree, and if it is less than 6, it degenerates into a linked list. Isn't the introduction of red-black trees to solve the problem of query efficiency? And many people have also discovered that 1.8 still causes the infinite loop of rehash?
ps: I think the official meaning is: HashMap I did not consider concurrency issues, I designed ConcurrentHashMap for you to use. Don't always use HashMap in silly concurrency scenarios. Doubts
Insert picture description here

The difference between ConcurrentHashMap and HashTable

Earlier we know that HashMap is not synchronized. The official recommendation is to use ConcurrentHashMap to solve concurrency problems. At the same time, I also know that Hashtable and HashMap are very similar to this data structure. Although it is not as good as HashMap in many places, it can be synchronized.
And the difference between the two:

  1. The underlying data structure:
  • In the bottom layer of ConcurrentHashMap of JDK1.7, a segmented array + linked list is used. In jdk1.8, the same data + linked list/red-black tree is adopted as HashMap. In fact, it is wrong to say this, because HashMap is also an array + linked list at the time of 7.
  • The underlying data structure of Hashtable is also in the form of array + linked list. (That is to say, the times have changed, and the adults have not changed, so now Hashtable has not improved and has been eliminated)
  1. The way to achieve thread safety is different:

In JDK1.7, ConcurrentHashMap (segment lock) divides the entire bucket array into segments. Each lock only locks part of the data in the container. Multi-threaded access to the data of different data segments in the container does not There will be lock contention and increase the concurrent access rate. By the time of JDK1.8, the concept of segment has been abandoned, and it is implemented directly with the data structure of array + linked list + red-black tree, and concurrency control is operated by synchronized and CAS. (A lot of optimizations have been made to synchronized locks after JDK1.6) The whole looks like an optimized and thread-safe HashMap. Although the data structure of Segment can be seen in JDK1.8, the attributes have been simplified, just for Compatible with the old version;

Hashtable (the same lock): Using synchronized to ensure thread safety is very inefficient. When a thread accesses the synchronization method, other threads also access the synchronization method, and may enter a blocking or polling state. For example, use put to add elements, another thread cannot use put to add elements, nor use get, and the competition will become increasingly fierce The lower the efficiency.

Therefore, although the implementation of Hashtable is simpler than the implementation of concurrentHashMap. But the efficiency is much worse, it is simply locked, and it is very mechanical in every synchronization operation.
Image source: http://www.cnblogs.com/chengxiao/p/6842045.html
Insert picture description here
Insert picture description here
Insert picture description here
Content source: JavaGuide , WeChat public account : Lonely Smoke

Guess you like

Origin blog.csdn.net/H1517043456/article/details/107537853