Get to know the hash table and implement the hash table function by yourself

Hash table

  • The internal hash table is an array;
  • key (key value) and hashFunc (hash function) establish a relationship (usually modulo (%), or a function relationship), and get a subscript;
  • Hash function: hash(key) = key% capacity; (capacity is the total size of the underlying space of the storage element)

1. Concept

  • You can get the searched element directly from the table at a time without any comparison. If a storage structure is constructed, a one-to-one mapping relationship can be established between the storage location of an element and its key code through a certain function (hashFunc), then the element can be quickly found through the function when searching.
  1. Insert element: According to the key code of the element to be inserted, the storage location of the element is calculated by this function and stored according to this location.
  2. Search element: Perform the same calculation on the key code of the element, and take the obtained function value as the storage location of the element, and compare the element according to this position in the structure. If the
    key code is equal, the search is successful

2. Hash collision

  • Hash conflict: Different key codes can get the same subscript through the same hash function;

3. Conflict avoidance

  • First of all, we need to be clear, because the capacity of the underlying array of our hash table is often smaller than the actual number of keywords to be stored, this leads to a problem. The occurrence of conflicts is inevitable, but what we can do is to try our best To reduce the conflict rate.

4. Conflict avoidance-hash function design

  • Common hash functions
  1. Direct customization method-(commonly used): take a certain linear function of the keyword as the hash address: Hash(Key) = A*Key + B Advantages: simple and uniform Disadvantages: need to know the distribution of keywords in advance Usage scenarios: suitable Look for small and continuous cases.
  2. Divide and leave remainder method-(commonly used): Set the number of addresses allowed in the hash table as m, and take a prime number p not greater than m but closest to or equal to m as the divisor, according to the hash function: Hash(key) = key% p(p<=m), convert the key code into a hash address.
  3. Squared method – (OK)
  4. Folding method-(understand)
  5. Random number method-(understand)

5. Conflict avoidance-load factor

  • Load factor = the current number of elements in the hash table / the size of the underlying array
  • The conflict rate is proportional to the load factor, so reducing the load factor can reduce the conflict rate, which can be reduced by expanding the array;
    Insert picture description here

6. Conflict resolution-two methods

6.1 Closed hash

  • Concept : also called open addressing method. When a hash conflict occurs, if the hash table is not full, it means that there must be an empty position in the hash table. Then the key can be stored in the "next" of the conflict position. Go in an empty position.
  • Ways to find empty positions:
  1. Linear detection : starting from the conflicting position, detecting backwards in sequence, until the next empty position is found.
  • As shown in the figure below, when 44 is placed, the key code of 44 is also 4, but there are already elements at position 4, so perform linear detection, then look at position 5, there are also elements, so linear detection is performed every time until 8 If the number position is empty, put 44 in.
    Insert picture description here
    Secondary detection : The defect of linear detection is the accumulation of conflicting data, which is related to finding the next empty location, because the way to find the empty location is to find the empty location one by one, so the second detection is to avoid this problem , The method to find the next empty position is: Hi= (H0+i^2 )% m , or: Hi= (H0 -i^2 )% m . Among them: i = 1,2,3..., is the position obtained by calculating the key code key of the element through the hash function Hash(x), and m is the size of the table.

  • As shown in the figure below, when 44 is placed, the key code of 44 is also 4, but there are already elements at position 4, so perform a second detection, 4+1 2=5, but there are elements at position 5, continue with 4+ 2 2=8, j is put in; but because the array space is limited, it may exceed the array range when detecting, so every time if the capacity is 5%, it is still in the array range, the same is true for 8;
    Insert picture description here

6.2 Open Hash/Hash Bucket (Key Master)

  • Concept : The open hash method is also called the chain address method (open chain method). First, use a hash function to calculate the hash address for the key code set. The key codes with the same address belong to the same subset, and each subset is called a bucket , The elements in each bucket are linked by a singly linked list, and the head node of each linked list is stored in the hash table.
    Insert picture description here
  • As can be seen from the above figure, each bucket in the open hash is an element that has a hash conflict ; each conflict node that comes in is connected by tail interpolation;
  • When the length of the linked list exceeds 8, it will become a red-black tree;
  1. JDK1.7: Use array + linked list (head insertion method);
  2. JDK1.8: use array + linked list (tail insertion method) + red-black tree;
  • Hash expansion: each element must be traversed and re-hash;

7. Realize

7.1 Hash table insert element (hash bucket)

//放入哈希表
    public void put(int key,int value){
    
    
        int index = key % array.length;//拿到位置
        //遍历这个位置下的链表
        for (Node cur = array[index];cur != null;cur = cur.next){
    
    
            if (cur.data == key){
    
    //判断一下之前有没有插入进去过value这个值,如果之前已经有了直接替换掉
                cur.value = value;
                return;
            }
        }
        //到了这里说明没有和当前key值一样的元素,进行头插法
        //头插法:首先先跟后面的联系起来;
        Node node = new Node(key,value);//待插入节点
        node.next = array[index];//数组里面放的就是第一个节点的地址
        array[index] = node;//然后把新插入的地址放到数组里面
        usedSize++;//插入进去之后,链表的数量加一
        //判断是否需要扩容,当负载因子大于0.75,就需要扩容
        if (loadFactor() > 0.75){
    
    
            //需要扩容
            resize();//调用扩容函数
        }
    }

7.2 Calculate the load factor

//计算负载因子
    public float loadFactor(){
    
    
        return usedSize*1.0f / array.length;
    }

7.3 Expanding the hash table

//扩容(扩容的时候必须重新哈希)
    public void resize(){
    
    
        Node[] newArray = new Node[2*array.length];//扩为原来的二倍
        for (int i = 0;i < array.length;i++){
    
    //遍历,进行重新哈希
            Node curNext = null;//用来标记下一个,防止重新哈希cur的时候,后面的节点丢失;
            for (Node cur = array[i];cur != null;cur = curNext){
    
    //对每一个数组下标下的链表都要进行遍历;
                curNext = cur.next;
                //array[i]下标是一个链表
                //cur是头节点
                int index = cur.data % newArray.length;//计算新的位置
                cur.next = newArray[index];//然后进行头插
                newArray[index] = cur;
            }
        }
        this.array = newArray;//最后赋值得到新的扩容之后的哈希;
    }

7.4 Obtain value according to key

//根据键去获取值value
    public int getvalue(int key){
    
    
        int index = key % array.length;//拿到位置
        for(Node cur = array[index];cur != null;cur = cur.next){
    
    
            if (cur.data == key){
    
    
                return cur.value;
            }
        }
        return -1;
    }

8. Complete source code and results

//哈希表
//哈希桶
class HashBuck{
    
    

    static class Node{
    
    
        int data;//map.put(key,value),data就是键key
        int value;
        Node next;
        public Node(int data,int value){
    
    
            this.data = data;
            this.value = value;
        }
    }
    Node[] array;//数组里面放的是Node节点
    int usedSize;
    public HashBuck(){
    
    
        this.array =new Node[10];
        usedSize = 0;
    }
    //放入哈希表
    public void put(int key,int value){
    
    
        int index = key % array.length;//拿到位置
        //遍历这个位置下的链表
        for (Node cur = array[index];cur != null;cur = cur.next){
    
    
            if (cur.data == key){
    
    //判断一下之前有没有插入进去过value这个值,如果之前已经有了直接替换掉
                cur.value = value;
                return;
            }
        }
        //到了这里说明没有和当前key值一样的元素,进行头插法
        //头插法:首先先跟后面的联系起来;
        Node node = new Node(key,value);//待插入节点
        node.next = array[index];//数组里面放的就是第一个节点的地址
        array[index] = node;//然后把新插入的地址放到数组里面
        usedSize++;//插入进去之后,链表的数量加一
        //判断是否需要扩容,当负载因子大于0.75,就需要扩容
        if (loadFactor() > 0.75){
    
    
            //需要扩容
            resize();//调用扩容函数
        }
    }
    //计算负载因子
    public float loadFactor(){
    
    
        return usedSize*1.0f / array.length;
    }
    //扩容(扩容的时候必须重新哈希)
    public void resize(){
    
    
        Node[] newArray = new Node[2*array.length];//扩为原来的二倍
        for (int i = 0;i < array.length;i++){
    
    //遍历,进行重新哈希
            Node curNext = null;//用来标记下一个,防止重新哈希cur的时候,后面的节点丢失;
            for (Node cur = array[i];cur != null;cur = curNext){
    
    //对每一个数组下标下的链表都要进行遍历;
                curNext = cur.next;
                //array[i]下标是一个链表
                //cur是头节点
                int index = cur.data % newArray.length;//计算新的位置
                cur.next = newArray[index];//然后进行头插
                newArray[index] = cur;
            }
        }
        this.array = newArray;//最后赋值得到新的扩容之后的哈希;
    }
    //根据键去获取值value
    public int getvalue(int key){
    
    
        int index = key % array.length;//拿到位置
        for(Node cur = array[index];cur != null;cur = cur.next){
    
    
            if (cur.data == key){
    
    
                return cur.value;
            }
        }
        return -1;
    }
}


public class TestDemo {
    
    
    public static void main(String[] args) {
    
    
        HashBuck hashBuck = new HashBuck();
        hashBuck.put(1,1);
        hashBuck.put(2,2);
        hashBuck.put(3,3);
        hashBuck.put(4,4);
        hashBuck.put(5,5);
        hashBuck.put(6,6);
        hashBuck.put(7,7);
        System.out.println("abcd");
    }
}

  • Note : The result can be debugged and verified by yourself (by breaking the point, in the output sentence), you can see the inserted value;
    Insert picture description here

Guess you like

Origin blog.csdn.net/qq_45665172/article/details/109813531