Data structure and algorithm from hash table to HashMap (1)

Introduction to Hash Table

When storing data, arrays and linked lists generally do not determine the order in which data is stored based on the nature of the data, so when querying data, they can only traverse from front to back to find the desired data;

For example, when querying whether there is a keyword 324 in an int array [12,34,56,324,24387], we can only query from the 0th bit back, and then query 324, his efficiency is o (n); the linked list also Similar to arrays.

For binary search trees and red-black trees, they save data according to the size of the data, using the nature of the data, so their search efficiency is much higher than arrays and linked lists;

Binary tree

For example, for the binary tree as shown in the figure, when looking for the array 5, first compare with 10, then compare with 3, and finally find 5, his query efficiency is o (h), where h is the height of the binary tree, and the height of the binary tree is expected. ln (n), the red-black tree is ln (n), so their query efficiency is much higher than arrays and linked lists.

However, the binary search tree and the red-black tree only use the size relationship between the data. If we directly place the data according to the nature of the data, then the query efficiency is the highest.

Consider the simplest case first, we want to save 0-99 data, then we use an array of length 100 to save, then save 0 in the 0th position, 1 save in the first position, 99 save in the last position Then, in theory, our search performance is o (1), because we only need to pay attention to whether there is data corresponding to the number of bits.

public interface Entry<K,V> {
    K getKey();
    V getValue();
}

public class SimpleEntry implements Entry<Integer,String> {
    private Integer key;
    private String value;

    public SimpleEntry() {
    }

    public SimpleEntry(Integer key, String value) {
        this.key = key;
        this.value = value;
    }

    @Override
    public Integer getKey() {
        return key;
    }

    @Override
    public String getValue() {
        return value;
    }

    @Override
    public String toString() {
        return "SimpleEntry{" +
                "key=" + key +
                ", value='" + value + '\'' +
                '}';
    }
}

public class SimpleMap {
    private Entry<Integer, String>[] data;

    public SimpleMap() {
        this.data = new SimpleEntry[100];
    }

    public void put(Integer key, String value) {
        data[key] = new SimpleEntry(key, value);
    }

    public String get(Integer key) {
        Entry<Integer, String> entry = data[key];
        if (entry != null) {
            return entry.getValue();
        }
        return null;
    }

}

public class Test {
    public static void main(String[] args) {
        SimpleMap simpleMap = new SimpleMap();
        simpleMap.put(10,"adj");
        simpleMap.put(13,"dfh45");
        simpleMap.put(23,"453vete");
        simpleMap.put(67,"459vrj");
        simpleMap.put(49,"547843vnrmd");

        System.out.println(simpleMap.get(13));
        System.out.println(simpleMap.get(67));
        System.out.println(simpleMap.get(1));

    }
}

//输出
dfh45
459vrj
null

The above code implements an extremely simple hash table to save data.

But we usually save more than 100 data keys, we can not create an array as large, and the amount of data usually saved is also far less than the possible value of the key, creating such a large array wastes space.

In order to solve this situation, the sages came up with a hash function to solve this problem. Specifically, the infinite key set is mapped to a slot k, and then the specific data is saved before the calculation The hash value is then saved in the slot corresponding to the hash value.

Then there is a problem, that is, if two possible hash values are the same, how should we solve it.

The most ideal way is to avoid conflicts, but this is not possible, so make the hash value as random as possible to reduce the probability of conflicts, and then when there is a conflict, there are ways to resolve it.

There are two ways to resolve conflicts: the link method and the open addressing method. Let's introduce these two methods to resolve conflicts separately.

Open Addressing

The idea of the open addressing method to resolve conflicts is very simple. When inserting a piece of data, if the slot corresponding to the calculated hash value is already occupied by other keys, then look for another slot according to certain rules and know to find the corresponding Location.

Here is a simple open addressing method, that is, if the slot is occupied, to find the next adjacent slot, and the hash value is also directly selected by the remainder after dividing by 100:

public class SimpleMap2 {
    private Entry<Integer, String>[] data;

    public SimpleMap2() {
        this.data = new SimpleEntry[100];
    }

    public void put(Integer key, String value) {
        int index = hash(key);
        Entry<Integer, String> entry = null;
        while ((entry = data[index]) != null
                && !entry.getKey().equals(key)) {
            index = getNextIndex(index);
        }
        //这个时候要么data[index]为空，要么data[index]保存的值key等于传入的key
        data[index] = new SimpleEntry(key, value);
    }

    public String get(Integer key) {
        int index = hash(key);
        Entry<Integer, String> entry = null;
        while ((entry = data[index]) != null) {
            if (entry.getKey().equals(key)){
                return entry.getValue();
            }
            index = getNextIndex(index);
        }
        return null;
    }

    private int hash(Integer key) {
        return key % 100;
    }

    private int getNextIndex(int index) {
        return index + 1;
    }

}

SimpleMap2 simpleMap2 = new SimpleMap2();
simpleMap2.put(10,"adj");
simpleMap2.put(110,"dfh45");
simpleMap2.put(11,"453vete");
simpleMap2.put(4,"459vrj");
simpleMap2.put(347,"547843vnrmd");

System.out.println(simpleMap2.get(110));
System.out.println(simpleMap2.get(4));
System.out.println(simpleMap2.get(1));

//输出
dfh45
459vrj
null

It can be seen that the addressing rules and hash function selection are extremely simple, and we only considered the case where the key is an int number. However, in actual coding, we can convert any object to an int value first, and then calculate the corresponding The hash value, and the function that converts the object to an int value, is the hashCode function in Java, and his default implementation is based on the memory address of the object.

Link method

The idea of the link method to resolve conflicts is also very simple. It is to save a linked list (or other data structure) in each slot, and directly store the same hash data on a linked list.

Specific to Java's HashMap, it uses the link method to resolve conflicts. Therefore, we do not implement this method here. When we introduce HashMap in the next article, we will realize the link method in detail.

Regardless of the link method and the open addressing method, they can resolve conflicts, but we must realize that although they resolve the conflicts, their performance will also drop significantly when the amount of hash conflicts is very large, so generally When the amount of data inserted is relatively large, the hash function will be re-selected. The new hash function has more slots, which reduces the probability of conflict. This process is called the rehash process. The specific implementation of this is also introduced when the HashMap source code is interpreted.

Star laughter _

Published 19 original articles · praised 8 · visits 4041

Private letter concerns