Analysis of java8 hash algorithm

what is hash algorithm

When many javaers use HashMap, they know that this data structure is very easy to use, the access speed is fast, and any type of key-value pair can be plugged into it, which is very convenient. But the implementation mechanism behind the scenes may not be understood.

The underlying data structure of HashMap is an array, and a linked list is stored in the array. To ensure fast insertion of key-value pairs and fast retrieval through keys, it is necessary to convert keys into array indexes, that is to say, the ability to convert any key into Integer type data is required. And this conversion algorithm is the hash algorithm .

hash algorithm in jdk

In order to convert any key to an integer, jdk provides a hashCode method for each object, which provides an algorithm for converting keys to integers.

Let's go to the source code and see the hash algorithm implementation of several key classes in jdk.

Integer

private final int value;

public int hashCode() {
    return value;
}

Very simple, Integer's hash algorithm is to get its value directly. The effect is good, after all, the int integer range is very large, and the dispersion is wide and the conflict is small.

Double

private final double value;

public int hashCode() {
	long bits = doubleToLongBits(value);
	return (int)(bits ^ (bits >>> 32));
}

Since double cannot be used as an index, it needs to be converted to an integer

  • Since the bottom layer of the double data type is represented by a 64-bit bit code, it is encoded by the IEEE floating point standard. If you use the 8-byte integer encoding method, you can get a number of type long
  • The long type is too large as an index range and needs to be converted to an int type. Simply getting the low 32 bits here can easily lead to uneven hashing, because the high bits are not used. Therefore, a logical right shift is used here by 32 bits, and the high 32 bits and the low 32 bits are XORed, so that the high bits and low bits can be used.
  • The final number is cast to int, leaving only the low 32 bits that have been fully scrambled

The resulting int type integer is sufficiently scrambled, and the hash is uniform.

The XOR operation is used instead of other AND, OR, etc., mainly because XOR has a better scramble effect

Boolean

private final boolean value;

public int hashCode() {
	return value ? 1231 : 1237;
}

Simple and rude, directly use two prime numbers as the index of true or false. These two prime numbers are large enough to be used as indices with a low probability of collision

String

private final char value[];

public int hashCode() {
	int hash = 0;
	for (int i = 0; i < value.length; i++) {
		hash = hash * 31 + value[i];
	}
	return hash;
}

Set the string bit s, and use s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]the scramble algorithm implemented by the formula. This algorithm ensures that each character of the string can be fully utilized, and uses the prime number 31 as the base, weights bit by bit, disrupts the weight and arrangement position of the entire character, and makes the hashing effect better

If you can't understand the relationship between the above formula and the implementation algorithm, you can decompose this formula:

f(n)=s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

f(n)=f(n-1)*31+s[n-1]

Using the latter recursion formula, the recursive algorithm can be easily written:

public int hashRecur(String s) {
	int len = s.length();
	if (len == 0) {
		return 0;
	}
	return 31 * hashRecur(s.substring(0, len - 1)) + s.charAt(len - 1);
}

According to this recursive algorithm, it is not difficult to convert it into an iterative algorithm:

public int hashLoop(String s) {
	int hash = 0;
	for (int i = 0; i < s.length(); i++) {
		hash = hash * 31 + s.charAt(i);
	}
	return hash;
}

This algorithm is basically the same as the jdk algorithm (the jdk algorithm has been optimized a little)

custom object

If the hashCode method is not implemented, the system default will be used: use the physical address of the object, convert it to an int integer, and use it as an index. This method is not flexible, and sometimes it is necessary to select some attributes to identify the object according to the situation

public class Person {

	String name;
	
	int age;
	
	String id;
	
	boolean male;

	@Override
	public int hashCode() {
		final int prime = 31;
		int result = 1;
		result = prime * result + age;
		result = prime * result + ((id == null) ? 0 : id.hashCode());
		result = prime * result + (male ? 1231 : 1237);
		result = prime * result + ((name == null) ? 0 : name.hashCode());
		return result;
	}
}

This is the solution for eclipse auto-completion, and it is also recommended by the system. The principle is consistent with the hashCode of String. The hashCode is obtained for each element, and the bitwise weighted sum is used to ensure uniform hashing.

into an array of the specified size

According to the above method, the index M of each object is obtained, and it seems that the array can be inserted directly. However, there is a problem. These integers are 32-bit signed types, which may be very large and may be negative. The array is often set to a specified size N, and it is necessary to ensure that M can be scattered into 0 ~ N-1.

In order to be able to intercept a part of M, the simplest is to use M mod N, N is 2 to the k power. To ensure that negative numbers are also applicable, useM & N-1

There is still a problem in this way. Its principle is to intercept the range within N as the final result, resulting in the high bits of the 32-bit integer M not being used.

public int hash(Object key) {
    int M;
    int hash = (key == null) ? 0 : (M = key.hashCode()) ^ (M >>> 16);
    return hash & (N-1)
}

So, I saw the second hash algorithm of HashMap in java8. The idea is: logically shift the 32-bit integer M to the right by 16 bits, let the upper 16 bits and the lower 16 bits perform the OR operation, fully disrupt them, and finally truncate the integer within N. The index obtained in this way is evenly scattered.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325442278&siteId=291194637
Recommended