Implementation principle of HashMap

HashMap overview

HashMap is an asynchronous implementation of the Hashtable-based Map interface. This implementation provides all optional map operations and allows null values ​​and null keys. This class does not guarantee the order of the map, in particular it does not guarantee that the order will be permanent.

HashMap data structure

In the Java programming language, there are two basic structures, one is an array, and the other is an analog pointer (reference). All data structures can be constructed using these two basic structures, and HashMap is no exception. HashMap is actually a "linked list hash" data structure, that is, a combination of an array and a linked list.

As can be seen from the above figure, the bottom layer of HashMap is an array structure, and each item in the array is a linked list. When a new HashMap is created, an array is initialized.

 

/**
* The table, resized as necessary. Length MUST Always be a power of two.
 */transient Entry[] table;
static class Entry<K,V> implements Map.Entry<K,V> {    
    final K key;
    V value;
    Entry<K,V> next;    final int hash;
    ……
}

 

It can be seen that Entry is an element in the array, and each Map.Entry is actually a key-value pair, which holds a reference to the next element, which constitutes a linked list.

HashMap access implementation

storage

 

public V put(K key, V value) { // HashMap allows to store null keys and null values.
    // When the key is null, call the putForNullKey method and place the value in the first position of the array.  
    if (key == null)
        return putForNullKey(value); // Recalculate the hash value according to the keyCode of the key.
    int hash = hash(key.hashCode()); // Search the index of the specified hash value in the corresponding table.
    int i = indexFor(hash, table.length); // If the Entry at the i index is not null, loop through the next element of the e element.
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;        
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);            
            return oldValue;
        }
    }    
    // If the Entry at index i is null, there is no Entry here.
    modCount++;    
    // Add key and value to the i index.
    addEntry(hash, key, value, i);
    return null;
}

  

 

It can be seen from the above source code: when we put an element into the HashMap, we first recalculate the hash value according to the hashCode of the key, and obtain the position (ie subscript) of the element in the array according to the hash value. If there are other elements already stored in the position, then the elements in this position will be stored in the form of a linked list, the newly added element is placed at the head of the chain, and the first added element is placed at the end of the chain. If there is no element at this position in the array, the element is placed directly at that position in this array.

The addEntry(hash, key, value, i) method places the key-value pair at the index i of the array table according to the calculated hash value. addEntry is a package access method provided by HashMap. The code is as follows:

 

void addEntry(int hash, K key, V value, int bucketIndex) { // Get the Entry at the specified bucketIndex index  
    Entry<K,V> e = table[bucketIndex]; // Put the newly created Entry at the bucketIndex, and let the new Entry point to the original Entry  
    table[bucketIndex] = new Entry<K,V>(hash, key, value, e); // if the number of key-value pairs in the Map exceeds the limit
    if (size++ >= threshold) // Extend the length of the table object to twice the original size.
        resize(2 * table.length);
}

  

 

When the system decides to store the key-value pair in the HashMap, it does not consider the value in the Entry at all, but only calculates and determines the storage location of each Entry based on the key. We can completely regard the value in the Map collection as a subsidiary of the key. When the system determines the storage location of the key, the value can be stored there.

The hash(int h) method recalculates the hash once based on the hashCode of the key. This algorithm adds high-level calculation to prevent the hash conflict caused when the low-level remains unchanged and the high-level changes.

 

static int hash(int h) {
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);  
}

  

 

We can see that to find an element in HashMap, we need to find the position in the corresponding array according to the hash value of the key. How to calculate this position is the hash algorithm. As mentioned earlier, the data structure of HashMap is a combination of array and linked list, so of course we hope that the element positions in this HashMap are as evenly distributed as possible, and try to make the number of elements in each position only one, then when we use the hash algorithm to get At this position, we can immediately know that the element at the corresponding position is what we want, instead of traversing the linked list, which greatly optimizes the efficiency of the query.

For any given object, as long as its hashCode() returns the same value, the hash code value calculated by the program calling the hash(int h) method is always the same. The first thing we think of is to take the hash value modulo the length of the array, so that the distribution of elements is relatively uniform. However, the consumption of the "modulo" operation is still relatively large, which is done in HashMap: call the indexFor(int h, int length) method to calculate which index the object should be stored in the table array. The code for the indexFor(int h, int length) method is as follows:

 

static int indexFor(int h, int length) {  
    return h & (length-1);
}

  

 

This method is very clever, it obtains the storage bit of the object through h & (table.length -1), and the length of the underlying array of HashMap is always 2 to the nth power, which is the optimization of HashMap in terms of speed. In the HashMap constructor there is the following code:

 

 

int capacity = 1;
while (capacity < initialCapacity)  
    capacity <<= 1;

  

 

This code ensures that the capacity of HashMap is always 2 to the nth power when initialized, that is, the length of the underlying array is always 2 to the nth power.

When length is always 2 to the nth power, the h& (length-1) operation is equivalent to modulo length, which is h%length, but & is more efficient than %.

This looks very simple, but it is actually quite mysterious. Let's take an example to illustrate:

Assuming that the array lengths are 15 and 16, and the optimized hash codes are 8 and 9, respectively, the results of the & operation are as follows:

h & (table.length-1) hash   table.length-1  
8 & (15-1): 0100 & 1110 = 0100
9 & (15-1): 0101 & 1110 = 0100
8 & (16-1): 0100 & 1111 = 0100
9 & (16-1): 0101 & 1111 = 0101

As can be seen from the above example: when they are "AND" with 15-1 (1110), the same result is produced, that is to say, they will be positioned at the same position in the array, which will cause a collision , 8 and 9 will be placed in the same position in the array to form a linked list, then when querying, you need to traverse the linked list to get 8 or 9, which reduces the efficiency of the query. At the same time, we can also find that when the length of the array is 15, the hash value will be "ANDed" with 15-1 (1110), then the last bit is always 0, and 0001, 0011, 0101, 1001, 1011, 0111 , 1101 can never store elements in these positions, and the space waste is quite large. What’s worse is that in this case, the positions that can be used in the array are much smaller than the length of the array, which means that the probability of collision is further increased, reducing the Slow down the efficiency of the query! When the length of the array is 16, that is, when it is the nth power of 2, the value of each bit of the binary number obtained by 2n-1 is 1, which makes the low bit of the obtained and the original hash when the low bit & In addition, the hash(int h) method further optimizes the hashCode of the key and adds high-level calculation, so that only two values ​​of the same hash value will be placed in the same position in the array to form a linked list.

Therefore, when the length of the array is the nth power of 2, the probability that different keys are calculated to have the same index is smaller, then the data is distributed evenly on the array, that is to say, the probability of collision is small. When there is no need to traverse the linked list at a certain position, the query efficiency will be higher.

According to the source code of the put method above, when the program tries to put a key-value pair into the HashMap, the program first determines the storage location of the Entry according to the return value of the hashCode() of the key: if the keys of the two Entry The hashCode() returns the same value, so they are stored in the same location. If the keys of these two Entry return true through equals comparison, the value of the newly added Entry will overwrite the value of the original Entry in the collection, but the key will not overwrite. If the keys of these two Entry return false through equals comparison, the newly added Entry will form an Entry chain with the original Entry in the collection, and the newly added Entry is located at the head of the Entry chain - continue to see the description of the addEntry() method for details. .

read

public V get(Object key) {
    if (key == null)
        return getForNullKey();
    int hash = hash(key.hashCode());
    for (Entry<K,V> e = table[indexFor(hash, table.length)];e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k)))  
            return e.value;
    }    
    return null;
}

  

 

With the hash algorithm stored above as a basis, it is easy to understand this code. As can be seen from the above source code: when getting elements from HashMap, first calculate the hashCode of the key, find an element in the corresponding position in the array, and then use the equals method of the key to find the required element in the linked list of the corresponding position.

induction

Simply put, HashMap treats key-value as a whole at the bottom, and this whole is an Entry object. The bottom layer of HashMap uses an Entry[] array to store all key-value pairs. When an Entry object needs to be stored, its storage location in the array will be determined according to the hash algorithm, and its location in the array will be determined according to the equals method. When an Entry needs to be taken out,
its storage location in the array will be found according to the hash algorithm, and then the Entry will be taken out from the linked list in this position according to the equals method.

HashMap的resize(rehash)

When there are more and more elements in the HashMap, the probability of hash conflict is higher and higher, because the length of the array is fixed. Therefore, in order to improve the efficiency of the query, it is necessary to expand the array of HashMap. The operation of array expansion will also appear in the ArrayList. This is a common operation. After the expansion of the HashMap array, the most performance-consuming point appears: The data in the original array must recalculate its position in the new array and put it in. This is resize.

So when will HashMap expand? When the number of elements in the HashMap exceeds the array size loadFactor, the array will be expanded. The default value of loadFactor is 0.75, which is a compromise value. That is to say, by default, the size of the array is 16, then when the number of elements in the HashMap exceeds 16 0.75=12, the size of the array is expanded to 2*16=32, that is, doubled, and then recalculated for each The position of each element in the array, and this is a very performance-intensive operation, so if we have predicted the number of elements in the HashMap, the preset number of elements can effectively improve the performance of the HashMap.

Performance parameters of HashMap

HashMap contains the following constructors:

  1. HashMap(): Constructs a HashMap with an initial capacity of 16 and a load factor of 0.75.

  2. ashMap(int initialCapacity): Construct a HashMap with an initial capacity of initialCapacity and a load factor of 0.75.

  3. HashMap(int initialCapacity, float loadFactor): Create a HashMap with the specified initial capacity and specified load factor.

HashMap's basic constructor HashMap(int initialCapacity, float loadFactor) takes two parameters, which are initialCapacity and loadFactor.

The load factor loadFactor measures the usage of the space of a hash table. The larger the load factor, the higher the filling degree of the hash table, and vice versa. For a hash table using the linked list method, the average time to find an element is O(1+a), so if the load factor is larger, the space is more fully utilized, but the result is a reduction in search efficiency; if the load factor is too large If it is small, the data in the hash table will be too sparse, causing a serious waste of space.

In the implementation of HashMap, the maximum capacity of HashMap is judged by the threshold field:

1
threshold = ( int )(capacity * loadFactor);

Combined with the definition formula of the load factor, the threshold is the maximum number of elements allowed under the corresponding loadFactor and capacity. If this number is exceeded, resize is performed to reduce the actual load factor. The default load factor of 0.75 is a balanced choice for space and time efficiency. When the capacity exceeds this maximum capacity, the HashMap capacity after resize is twice the capacity:

Fail-Fast mechanism

We know that java.util.HashMap is not thread-safe, so if another thread modifies the map in the process of using the iterator, ConcurrentModificationException will be thrown, which is the so-called fail-fast strategy.

The implementation of this strategy in the source code is through the modCount field. As the name implies, modCount is the number of modifications. Any modification to the HashMap content will increase this value, then this value will be assigned to the expectedModCount of the iterator during the iterator initialization process.

 

HashIterator () {
    expectedModCount = modCount;
    if (size > 0) { // advance to first entry
        Entry[] t = table;   
        while (index < t.length && (next = t[index++]) == null)  
        ;
    }
}

   

In the iteration process, judge whether modCount and expectedModCount are equal. If they are not equal, it means that other threads have modified the Map:

Note that modCount is declared volatile, which guarantees the visibility of modifications between threads.

 

final Entry <K, V> nextEntry () {
    if (modCount != expectedModCount)        
        throw new ConcurrentModificationException();
}

   

In HashMap's API it states:

The iterators returned by all the "collection view methods" of the HashMap class are fail-fast: if the map is structurally modified after the iterator is created, any time or any other way, except through the remove method of the iterator itself Modifications of the iterator will throw ConcurrentModificationException. Thus, in the face of concurrent modifications, iterators can fail completely and quickly without risking arbitrary nondeterministic behavior at an indeterminate time in the future.

Note that the fail-fast behavior of iterators is not guaranteed, and in general it is impossible to make any firm guarantees in the presence of unsynchronized concurrent modifications. Fail-fast iterators do their best to throw ConcurrentModificationException. Therefore, it is wrong to write programs that depend on this exception, and it is right: the fail-fast behavior of iterators should only be used to detect program errors.

Two ways to traverse HashMap

The first

 

Map map = new HashMap();
Iterator iter = map.entrySet().iterator();while (iter.hasNext()) {
   Map.Entry entry = (Map.Entry) iter.next();
   Object key = entry.getKey();
   Object val = entry.getValue();
}

   

High efficiency, we must use this method in the future!

the second

 

Map map = new HashMap();
Iterator iter = map.keySet().iterator();while (iter.hasNext()) {
  Object key = iter.next();
  Object val = map.get(key);
}

   

Low efficiency, use it as little as possible in the future!

 

 

Original address: http://www.importnew.com/16301.html

 

Recommend a WeChat public account: The Way of Java Technology Learning, share Java technology dry goods every day, contains more than 200 G learning resources, pay attention to get it for free!

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324737676&siteId=291194637