The implementation principle of HashMap, and finally the interview questions and answers of the underlying principle are attached

1 Introduction

 HashMap is an asynchronous implementation of the Hashtable-based Map interface. This implementation provides all optional map operations and allows null values ​​and null keys. This class does not guarantee the ordering of the mapping.

2. The data structure of HashMap:

HashMap is actually an "array of linked list" data structure, each element stores an array of linked list header nodes, that is, a combination of array and linked list.

As can be seen from the above figure, the bottom layer of HashMap is an array structure, and each item in the array is a linked list. When a new HashMap is created, an array is initialized. The source code is as follows:

It can be seen that Entry is an element in the array, and each Map.Entry is actually a key-value pair, which holds a reference to the next element, which constitutes a linked list.

3.HashMap access implementation:

1) Storage:

It can be seen from the above source code: when we put an element into the HashMap, first recalculate the hash value according to the hashCode of the key, and obtain the position (ie subscript) of the element in the array according to the hash value. If there are other elements already stored in the position, then the elements in this position will be stored in the form of a linked list, the newly added element is placed at the head of the chain, and the first added element is placed at the end of the chain. If there is no element at this position in the array, the element is placed directly at that position in this array.

The addEntry(hash, key, value, i) method places the key-value pair at the i index of the array table according to the calculated hash value. addEntry is a package access permission method provided by HashMap (that is, without the three access permission modifiers of public, protected and private, it is the default access permission, which is represented by default, but there is no such default in the code), the code is as follows:

When the system decides to store the key-value pair in the HashMap, it does not consider the value in the Entry at all, but only calculates and determines the storage location of each Entry based on the key. We can completely regard the value in the Map collection as a subsidiary of the key. When the system determines the storage location of the key, the value can be stored there.

The hash(int h) method recalculates the hash once based on the hashCode of the key. This algorithm adds high-level calculation to prevent the hash conflict caused when the low-level remains unchanged and the high-level changes.

This method is very clever, it obtains the storage bit of the object through h & (table.length -1), and the length of the underlying array of HashMap is always 2 to the nth power, which is the optimization of HashMap in terms of speed. In the HashMap constructor there is the following code:

This code ensures that the capacity of HashMap is always 2 to the nth power when initialized, that is, the length of the underlying array is always 2 to the nth power.

When length is always 2 to the nth power, the h& (length-1) operation is equivalent to modulo length, which is h%length, but & is more efficient than %.

This looks very simple, but it is actually quite mysterious. Let's take an example to illustrate:

Assuming that the array lengths are 15 and 16, and the optimized hash codes are 8 and 9, respectively, the results of the & operation are as follows:

As can be seen from the above example: when the two numbers 8 and 9 and (15-1)2=(1110) perform the "AND operation &", the same result is produced, which is 0100, that is to say they are It will be located in the same position in the array, which will cause a collision, 8 and 9 will be placed in the same position in the array to form a linked list, then when querying, you need to traverse the linked list to get 8 or 9, so This reduces the efficiency of the query. At the same time, we can also find that when the length of the array is 15, the hash value will be "ANDed &" with (15-1)2=(1110), then the last bit will always be 0, and 0001, 0011, 0101 , 1001, 1011, 0111, 1101 these positions can never store elements, the space waste is quite large, and what's worse is that in this case, the available positions of the array are much smaller than the length of the array, which means that further increase It reduces the chance of collision and slows down the efficiency of query!

When the length of the array is 16, which is the nth power of 2, the value of each bit of the binary number obtained by 2n-1 is 1 (such as (24-1)2=1111), which makes When you go to &, the result is the same as the low-order bit of the original hash. In addition, the hash(int h) method further optimizes the hashCode of the key, and adds high-order calculation, so that only two values ​​with the same hash value will be put into the array. form a linked list at the same position in the .

Therefore, when the length of the array is the nth power of 2, the probability that different keys are calculated to have the same index is smaller, then the data is distributed evenly on the array, that is to say, the probability of collision is small. When there is no need to traverse the linked list at a certain position, the query efficiency will be higher.

According to the source code of the put method above, when the program tries to put a key-value pair into the HashMap, the program first determines the storage location of the Entry according to the return value of the hashCode() of the key: if the keys of the two Entry The hashCode() returns the same value, so they are stored in the same location. If the keys of these two Entry return true through equals comparison, the value of the newly added Entry will overwrite the value of the original Entry in the collection, but the key will not overwrite. If the keys of these two Entry return false through equals comparison, the newly added Entry will form an Entry chain with the original Entry in the collection, and the newly added Entry is located at the head of the Entry chain - continue to see the description of the addEntry() method for details. .

With the hash algorithm stored above as a basis, it is easy to understand this code. It can be seen from the above source code: when getting elements from HashMap, first calculate the hashCode of the key, find an element in the corresponding position in the array, and then use the equals method of the key to find the required element in the linked list of the corresponding position.

3) To sum up simply, HashMap treats key-value as a whole at the bottom layer, and this whole is an Entry object. The bottom layer of HashMap uses an Entry[] array to store all key-value pairs. When an Entry object needs to be stored, its storage location in the array will be determined according to the hash algorithm, and its location in the array will be determined according to the equals method. The storage location in the linked list of ; when an Entry needs to be taken out, its storage location in the array will be found according to the hash algorithm, and then the Entry will be taken out from the linked list at that location according to the equals method.

4. HashMap的resize(rehash):

When there are more and more elements in the HashMap, the probability of hash conflict is higher and higher, because the length of the array is fixed. Therefore, in order to improve the efficiency of the query, it is necessary to expand the array of HashMap, and the operation of array expansion will also appear in the ArrayList. This is a common operation. After the expansion of the HashMap array, the most performance-consuming point appears: The data in the original array must recalculate its position in the new array and put it in. This is resize.

So when will HashMap expand? When the number of elements in the HashMap exceeds the array size *loadFactor, the array will be expanded. The default value of loadFactor is 0.75, which is a compromise value. That is to say, by default, the size of the array is 16, then when the number of elements in the HashMap exceeds 16*0.75=12 (this value is the threshold value in the code, also called the critical value), the size of the array is expanded. It is 2*16=32, which is doubled, and then recalculates the position of each element in the array, which is a very performance-consuming operation, so if we already know the number of elements in the HashMap, then the preset element The number of can effectively improve the performance of HashMap.

The code for HashMap expansion is as follows:

5.HashMap performance parameters:

HashMap contains the following constructors:

  1. HashMap(): Constructs a HashMap with an initial capacity of 16 and a load factor of 0.75.

  2. HashMap(int initialCapacity): Construct a HashMap with an initial capacity of initialCapacity and a load factor of 0.75.

  3. HashMap(int initialCapacity, float loadFactor): Create a HashMap with the specified initial capacity and specified load factor.

  4. HashMap's basic constructor HashMap(int initialCapacity, float loadFactor) takes two parameters, which are the initial capacity initialCapacity and the load factor loadFactor.

  5. initialCapacity: The maximum capacity of the HashMap, which is the length of the underlying array.

  6. loadFactor: The load factor loadFactor is defined as: the actual number of elements of the hash table (n) / the capacity of the hash table (m).

The load factor measures the usage of the space of a hash table. The larger the load factor, the higher the filling degree of the hash table, and vice versa. For a hash table using the linked list method, the average time to find an element is O(1+a), so if the load factor is larger, the space is more fully utilized, but the result is a reduction in search efficiency; if the load factor is too large If it is small, the data in the hash table will be too sparse, causing a serious waste of space.

In the implementation of HashMap, the maximum capacity of HashMap is judged by the threshold field:

Combined with the definition formula of the load factor, the threshold is the maximum number of elements allowed under this loadFactor and capacity. If the number exceeds this number, resize is performed to reduce the actual load factor (that is to say, although the length of the array is capacity, its capacity is expanded. The critical value is indeed the threshold). The default load factor of 0.75 is a balanced choice for space and time efficiency. When the capacity exceeds this maximum capacity, the HashMap capacity after resize is twice the capacity:

6.Fail-Fast mechanism:

We know that java.util.HashMap is not thread-safe, so if other threads modify the map in the process of using the iterator, ConcurrentModificationException will be thrown, which is the so-called fail-fast strategy. (This is also mentioned in the core java book.)

The implementation of this strategy in the source code is through the modCount field. As the name implies, modCount is the number of modifications. Any modification to the HashMap content will increase this value, then this value will be assigned to the expectedModCount of the iterator during the iterator initialization process.

In the iteration process, judge whether modCount and expectedModCount are equal. If they are not equal, it means that other threads have modified the Map:

Note that modCount is declared volatile, which guarantees the visibility of modifications between threads. (The reason why volatile is thread-safe is that variables modified by volatile do not save the cache and are modified directly in memory, thus ensuring the visibility of modifications between threads).

In HashMap's API it states:

The iterators returned by all the "collection view methods" of the HashMap class are fail-fast: if the map is structurally modified after the iterator is created, any time or any other way, except through the remove method of the iterator itself Modifications of the iterator will throw ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator will fail completely quickly without guaranteeing the risk of arbitrary indeterminate behavior at an indeterminate time in the future.

Note that the fail-fast behavior of iterators is not guaranteed, and in general it is impossible to make any firm guarantees in the presence of unsynchronized concurrent modifications. Fail-fast iterators do their best to throw ConcurrentModificationException. Therefore, it is wrong to write programs that depend on this exception, and it is right: the fail-fast behavior of iterators should only be used to detect program errors.

I specially sorted out the above technologies. There are many technologies that cannot be explained clearly in a few words, so I simply asked a friend to record some videos. The answers to many questions are actually very simple, but the thinking and logic behind them are not simple. If you know it, you also need to know why. If you want to learn Java engineering, high performance and distributed, explain the profound things in simple language. Friends of microservices, Spring, MyBatis, and Netty source code analysis can add my Java advanced group: 433540541. In the group, there are Ali Daniel live-broadcasting technology and Java large-scale Internet technology videos for free to share with you.

Some questions about the underlying principles that Internet companies will ask during interviews are also shared here.

Pros and cons of hashmap algorithm

1. Evenly distributed

2. Try to avoid conflict

A few questions about hashMap

1. The concept of hashing

This is a hash algorithm that hashes the Key object to the bucket where the value object needs to be stored.

2. The method of resolving collisions in HashMap

According to the hashcode of the key, the storage location bucket is obtained. It is possible that different keys get the same location, so a hash collision occurs. In this way, a linked list structure in each bucket is introduced to solve the collision. If a hash collision occurs in the hashMap, the key-value object is stored in the next node of the linked list.

3. The application of equals() and hashCode() and their importance in HashMap

When the hashcodes of two different key objects are the same, they will be stored in the linked list in the same bucket location. By traversing the linked list, the equals method of the key object is used to find the key-value pair (Entry) object stored in the linked list. The get method uses the equals and hashcode methods, and the put method uses the hashcode method.

4. Benefits of Immutable Objects

It is very appropriate to use immutable objects as keys.

1) If the hashcode of the key object at the time of storage is inconsistent with the hashcode of the key object at the time of acquisition, the value object cannot be obtained.

2) Immutable objects are thread-safe, reducing the chance of collisions.

5. Conditional competition of HashMap multithreading

When multiple threads try to adjust the size of the hashmap at the same time, the elements stored in the linked list will be reversed, because when moving to a new bucket position, the hashmap will not put the element at the end of the linked list, but at the end of the linked list. Head, this is to avoid tail traversal. This can lead to an infinite loop.

6. Resize the HashMap

If the size of the hashmap > current size * load factor (default 0.75) is found after the put element, the hashmap will be expanded by 0. The expansion process is to create a bucket array twice the size of the original map, and store the original Entry object in the new bucket array through the new hash algorithm. This process is a rehashing process, which is very time-consuming, so try to select an appropriate initial hashMap size during development to minimize the occurrence of expansion.

7. What is the underlying data structure of hashMap?

The hashMap contains a bucket array and a linked list structure corresponding to each bucket. Each KV object is stored in this linked list. This linked list structure is mainly used to solve the problem of key-value object storage after hash collision.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325652328&siteId=291194637