In-depth study of hash tables with pictures and texts (Part 1)

content

1. Introduction to Hash Tables

1. Hash collision

2.Hash conflict solution for jdk1.8

2. Hash function

1. The implementation steps of the hash function in the hash table:

2. How to generate the hash value of the key?

3. Summary:


1. Introduction to Hash Tables

Hash table also known as hash table

The bottom layer is implemented by array + singly linked list + red-black tree

 

Add, search, delete process

(1) Use the hash function to generate the array index index corresponding to the key O(1)

(2) Position the array element according to the index operation O(1)

Hash table adopts the idea of ​​space for time

1. Hash Collision

2 different keys, the same result is calculated by the hash function

Solution:

(1) Open addressing method

Probe to other addresses according to certain rules until empty

  • Linear detection

  • Square detection

(2) Rehashing

Design multiple hash functions

(3) Chain address method

For example, linking elements of the same index through a linked list

(4) Folding method: Divide the keyword into several parts with the same number of digits, and the number of digits in the last part can be different, and then take the superposition of these parts (remove the carry) as the hash address. Digital superposition can have two methods: shift superposition and boundary superposition . Shift stacking is to align the lowest bits of each divided part, and then add them; boundary stacking is to fold back and forth along the dividing boundary from one end to the other, and then align and add.

(5) Random number method: select a random function, and take the random value of the keyword as the hash address, that is, H(key)=random(key) where random is a random function , which is usually used when the length of the keyword is not equal.

(6) Divide remainder method: Take the remainder obtained after the keyword is divided by a number p not greater than the length m of the hash table table as the hash address. That is, H(key) = key MOD p, p<=m. Not only can the keyword be modulo directly, but also modulo after folding, squaring and other operations. The choice of p is very important. Generally, a prime number or m is taken. If p is not selected well, it is easy to generate synonyms.

2.Hash conflict solution for jdk1.8

  • By default, a singly linked list is used to string elements together

  • When adding elements, it may be converted from a singly linked list to a red-black tree to store elements

    • For example, when the capacity of the hash table is >= 64 and the number of nodes in the singly linked list is greater than 8

  • When the number of red-black tree nodes is small to a certain extent, it will be converted to a singly linked list.

  • The hash table in JDK1.8 uses linked list + red-black tree to solve hash collision

 

Why use a singly linked list?

(1) After calculating the index corresponding to the key by the hash function, traverse all the keys according to the index. If the key is the same, the value corresponding to the key will be overwritten , so every time a new element is added, the linked list must be traversed from the beginning and put A new element is inserted at the end of the linked list, which can be solved by using a singly linked list

(2) The singly linked list has one less pointer than the doubly linked list, which can save memory space

2. Hash function

1. The implementation steps of the hash function in the hash table:

(1) First generate the hash value of the key (must be an integer)

(2) Let the hash value of the key be correlated with the size of the array to generate an index value

 public int hash(Object key){
     return hash_code(key) % table.length;
 }

In order to improve efficiency, the & bit operation can be used to replace the % operation [design the length of the array to be the power of 2 2^n]

 public int hash(Object key){
     return hash_code(key) & (table.length - 1);
 }

Performing the & operation can make the hash value of the key less than table.length-1, which is the length of the subscript of the array, and will not cause the subscript to go out of bounds.

good hash function

Make the hash value more evenly distributed -> reduce the number of hash collisions -> improve the performance of the hash table-

2. How to generate the hash value of the key?

Common types of keys may include:

Integer, float, string, custom object

Different types of keys, the hash value is generated in different ways, but the purpose is the same

  • Try to make the hash value of each key unique

  • Try to let all the information of the key participate in the operation (the hash value of the key can be more likely to be different)

The hash value in Java must be of type int

(1) Integer The integer itself is used as a hash value

(2) Floating point numbers

Convert stored binary format to integer value

 public static int hashCode(float value){
     return floatToIntBits(value);
 }

(3) Long type

 public static int hashCode(long value){
     return (int) (value ^ (value >>> 32));
 }

(4) Double type

 public static int hashCode(double value){
     long bits = doubleToLongBits(value);
     return (int) (bits ^ (bit >>> 32));
 }

 >>> and ^ role

The high 32 bits and the low 32 bits are mixed to calculate the 32-bit hash value

Take full advantage of all the information to calculate the hash value

(5) Hash value of string

A string consists of several characters

For example, the string hash consists of four characters h, a, s, and h (the essence of the string is an integer)

Therefore, the hash value of hash can be expressed as h* n ^ 3 + a * n ^ 2 + s * n + h , (there will be repeated calculations such as n*n calculation and then calculation when n*n*n is calculated ) is equivalent to ([h * n + a ] * a + s ) * n +h

In JDK the multiplier n is 31

31 is an odd prime , JAM will optimize 31 * i to (i << 5) - i

(6) Custom objects

Create a new Person class

 class Person{
     //姓名
     private  String name;
     //身高
     private float height;
     //年龄
     private  int age;
 ​
     public Person() {
     }
 ​
     public Person(String name, float height, int age) {
         this.name = name;
         this.height = height;
         this.age = age;
     }
 }
 ​
 public class Demo01 {
  public static void main(String[] args) {
        Person p1 = new Person("张三",1.7f,20);
         Person p2 = new Person("张三",1.7f,20);
 ​
         System.out.println(p1.hashCode());
         System.out.println(p2.hashCode());
         Map<Object,Object> map = new HashMap<>();
         map.put(p1,1);
         map.put(p2,2);
         System.out.println(map.size());
 ​
     }
 }
 /**
  *运行结果: 2129789493
  *         668386784
  *         2
  */

The attribute values ​​of our p1 and p2 objects are exactly the same. At this time, if we specify that the attribute values ​​are the same, they are the same key, but we found that the output results of calling the hashCode() method directly are different. There are two key-value pairs in the map.

How can this be resolved?

Generally, we will override the hashCode() method in the class

 @Override
 public int hashCode() {
      int hashCode = Integer.hashCode(age);
      hashCode = hashCode * 31 + Float.hashCode(height);
      hashCode = hashCode * 31  + (name != null ? name.hashCode(): 0);
      return hashCode;
   
 }
 ​

In this way, the hash values ​​of p1 and p2 objects are the same. In fact, when we rewrite the hashCode() method, jdk will automatically rewrite it for us

 @Override
 public int hashCode() {
     return Objects.hash(name, height, age);
 }

Click on the hash() method -> hashCode() method, we find that the jdk source code is also processed in this way

 //将传入的值都放在object类型的数组a中
 public static int hashCode(Object a[]) {
     if (a == null)
         return 0;
 ​
     int result = 1;
 //循环遍历数组元素,计算每个元素的hashCode值 再计算整个数组a的哈希值
     for (Object element : a)
         result = 31 * result + (element == null ? 0 : element.hashCode());
 ​
     return result;
 }

In Java, the key of HashMap must implement the hashCode and equals methods, and the key is also allowed to be null

Why do you need to implement the equals method?

When we rewritten the hashCode() method to make the custom Person objects have the same attributes, their hash values ​​are the same, but at this time, adding p1 and p2 to the map set, can p2 cover p1 or not? heavy?

 System.out.println(p1.hashCode());
 System.out.println(p2.hashCode());
 Map<Object,Object> map = new HashMap<>();
 map.put(p1,1);
 map.put(p2,2);
 System.out.println(map.size());
 //运行结果:-407057726
 //        -407057726
 //         2

The running result found that the size of the map set is 2, indicating that there is no deduplication p1 and p2 are added.

Why is this?

In fact, many people have a misunderstanding when they first learn the hash table, that is, the same hash value means the same element, which is wrong, because we just mentioned hash collision earlier , and the key is different. The hash value calculated by the hash method may be the same , then the hash value of the same key must be the same, that is to say, we rewrite the hashCode() method to only calculate the hash value of the key and ensure that the hash value of the same key is the same , the corresponding array index must also be the same. Then when a hash conflict occurs, when using a singly linked list to solve it, we compare it from beginning to end to determine whether the keys are the same. Simply speaking, it is unreliable to judge whether it is the same key by comparing the hash value .

How to compare keys?

It's easy to think of == or equals

If it is ==, it is comparing memory addresses . When comparing objects, we will definitely not choose == for comparison, because the addresses of the two new new objects are not the same, but the attributes are exactly the same, so we use the equals method for comparison.

 //在Person类中重写equals方法
 ​
 @Override
 public boolean equals(Object obj) {
     //如果内存地址相等则两个元素相等就是自己本身
     if (this == obj) return true;
      //如果obj为空 或者obj与本类Person不是同一个类,那么也必定不同
     if (obj == null || getClass() != obj.getClass()) return false;
       //传来的obj对象是Person类的实例
     Person person = (Person) obj;
     //比较height
     if (Float.compare(person.height, height) != 0) return false;
     //比较年龄
     if (age != person.age) return false;
     // 比较name
     return name != null ? name.equals(person.name) : person.name == null;
 }

At this point, we add p1 and p2 to the map set, and we can achieve de-duplication! ! !

3. Summary:

The hashCode() method is to calculate the hash value, ensure that the hash value of the same element is the same, and then find the array index

The equals() method is to determine whether two keys are equal when a hash collision occurs

Note: In order to find the array index when calculating the hash value, the index of the same hash value must be the same, and the index of different hash values ​​may also be the same.

(because we need to do & operation with the size of the array after we calculate the hash value)

After rewriting the hashCode and equals methods, map adds p1, a key1, p2 three key-value pairs with the same hash value as p1

the process of

(1) Add the key-value pair (p1, 123) to the map set, the hash function calculates the hash value of p1 and then performs & operation with the length of the array, assuming that the index is 1, and it is found that the index data is empty, then adding data

 

(2) The map set continues to add key-value pairs ("key1", 456), the hash function calculates the hash value and performs the & operation to find that the array index corresponding to "key1" is also 1, and then know through the index that there is data at this index at this time. , traverse and compare from the beginning to the end, and find that "key1" is not the same as p1, add a key-value pair ("key1", 456) 

(3) The map set continues to add key-value pairs (p2, 789), the hash function calculates the hash value and performs the & operation to find that the array index corresponding to p2 is 1, and then know through the index that there is data at this index at this time, from beginning to end Traverse and compare, make p2 the same as p1, and overwrite

 The above is the process of hash table resolving hash collision and same key after rewriting hashCode and equals methods

Guess you like

Origin blog.csdn.net/qq_52595134/article/details/123242424