In-depth analysis of HashMap: understanding Hash, underlying implementation and expansion mechanism

Table of contents

1. Brief description

underlying implementation

2. Hash technology

Hash function

Hash collision

3. The underlying implementation of HashMap

data structure

storage structure

4. Expansion mechanism

When will it be expanded?

How to expand capacity

5. Summary

1. Brief description

HashMap is a commonly used data structure in Java. It stores data in the form of key-value pairs and has efficient search, insertion and deletion operations. This article will introduce the underlying implementation principles of HashMap in detail, including hash technology, underlying data structure and expansion mechanism, to help readers deeply understand the working principle of HashMap.

HashMap is a part of the Java collection framework. It is implemented based on a hash table and allows the use of any object as a key to store and retrieve values. HashMap is asynchronous, if multiple threads access it at the same time and at least one thread modifies the HashMap, it must be synchronized externally.

underlying implementation

public class HashMap<K, V> {
    static class Node<K, V> {
        final int hash;
        final K key;
        V value;
        Node<K, V> next;

        Node(int hash, K key, V value, Node<K, V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }
    }

    // 其他代码...
}

2. Hash technology

Hash function

A hash function is an algorithm that maps arbitrary length data to fixed length data. In HashMap, the role of the hash function is to map keys to an index location to quickly find and store key-value pairs.

Hash collision

When two or more keys have the same hash value, they are mapped to the same index position, a phenomenon called a hash collision. HashMap uses linked lists and red-black trees to resolve hash conflicts, ensuring that only one key-value pair is stored at each index position.

3. The underlying implementation of HashMap

data structure

The bottom layer of HashMap is implemented using the data structure of array + linked list + red-black tree. The array is the main body of HashMap and is used to store key-value pairs; the linked list is used to resolve hash conflicts; the red-black tree converts the linked list into a red-black tree when the length of the linked list exceeds a certain threshold (default is 8) to improve search efficiency. .

storage structure

The storage structure of HashMap is an array of Node type. Node is an internal class that implements the Map.Entry interface. Each Node object contains four attributes: key (key), value (value), hash (hash value) and next (pointer to the next Node). When a hash collision occurs, new key-value pairs are added to the linked list.

4. Expansion mechanism

When will it be expanded?

When the number of elements in the HashMap reaches the loading factor of the array size (default is 0.75), the expansion operation is triggered. The load factor is a threshold used to control the size of the array and the timing of expansion. The larger the loading factor, the higher the space utilization of the array, but the greater the probability of conflict; the smaller the loading factor, the lower the space utilization of the array, but the smaller the probability of conflict. Therefore, choosing an appropriate loading factor can balance space utilization and conflict probability.

How to expand capacity

The expansion operation consists of two steps: creating a new array and recalculating the hash value of the key. First, HashMap creates a new array whose size is twice the size of the original array. Then, HashMap will iterate through each element in the original array, recalculate the hash value of the key, and store the key-value pair in the new array. When recalculating the hash value, HashMap uses a special algorithm to ensure that the same key still has the same hash value in the new array. This algorithm is called "rehashing".

void resize(int newCapacity) {
    Node<K, V>[] oldTable = table;
    int oldCapacity = oldTable.length;
    
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    Node<K, V>[] newTable = new Node[newCapacity];
    transfer(newTable);
    table = newTable;
    threshold = (int) (newCapacity * loadFactor);
}

5. Summary

This article introduces in detail the underlying implementation principles of HashMap, including hash technology, underlying data structure and expansion mechanism. HashMap is an efficient data structure that uses hash tables to store and retrieve key-value pairs. By having a deep understanding of how HashMap works, we can better understand and use it to solve real-world problems. In actual development, we need to choose the appropriate loading factor and initial capacity to create a HashMap instance according to the specific situation to improve performance and efficiency.

Reference article

Talk about my understanding of HashMap expansion mechanism and underlying implementation-CSDN Blog