Detailed explanation of equals and hashCode

A few days ago, when reviewing the basic knowledge of Java, I noticed the two methods equals and hashCode. I was completely confused about writing hashCode and other issues. After careful analysis and code practice, I finally figured it out thoroughly. If there are any mistakes and deficiencies, please correct me.


1. First, let's take a look at the native equals and hashCode methods.

1.1、equals

The equals method in Object is the same as "==". The following code compares memory addresses.

 public boolean equals(Object obj) {
    
    
   return (this == obj);
}

1.2、hashCode

The native hashCode method returns a value converted from the memory address. It's defined like this:

public native int hashCode();

It can be seen that this is a native method, because the native method is not implemented by the Java language, so there is no specific implementation in the definition of this method. According to the jdk documentation, the implementation of this method is generally **"realized by converting the internal address of the object into an integer"**, and this return value is returned as the hash code value of the object.

1.3. Summary

So, without overriding equals and hashCode:

(1) If the equals of two objects are equal, the hashCode must be equal. Because equals uses "==" to compare by default, the memory address is compared, and hashCode obtains the hash value based on the memory address. If the memory address is the same, the obtained hash value must be the same.

​ (2) When the hashCode of two objects is equal, equals is not necessarily equal. Why is this? First of all, let’s talk about the hash table. The hash table combines direct addressing and chain addressing. In simple terms, it first calculates the hash value of the data to be inserted, and then inserts it into the Go to the corresponding grouping, because the hash function returns an int type, so there are only 2 to the 32nd power grouping at most, there are too many objects, and there are always times when the grouping is not enough. At this time, different objects will be The same hash value is generated, that is, the hash conflict phenomenon. At this time, the group can be replaced by a linked list through the chain address method. The hashCode of the objects on the same linked list must be equal, because they are different objects, so the memory address Different, so their equals must be unequal. The hashCode here is equivalent to a person's name, and equals is equivalent to an ID number. There are many people with the same name, but they are not the same person.

2. The case of rewriting equals and hashCode

2.1, do not rewrite

import java.util.*;
 
public class Test {
    
    
    public static void main(String[] args) {
    
    
        Person p1 = new Person();
        p1.name = "张三";
 
        Person p2 = new Person();
        p2.name = "李四";
 
        Person p3 = new Person();
        p3.name = "张三";
 
 
        Set set = new HashSet();
        set.add(p1);
        set.add(p2);
        set.add(p3);
 
        for (Iterator iter=set.iterator(); iter.hasNext();) {
    
    
            Person p = (Person)iter.next();
            System.out.println("name=" + p.name );
        }
 
        System.out.println("p1.hashCode=" + p1.hashCode());
        System.out.println("p2.hashCode=" + p2.hashCode());
        System.out.println("p3.hashCode=" + p3.hashCode());
        System.out.println();
 
        System.out.println("p1 equals p2," + p1.equals(p2));
        System.out.println("p1 equals p3," + p1.equals(p3));
    }
 
}
class Person {
    
    
    String name;
}

output:

img

Different equals and different hashCode

It can be seen that duplicate data is inserted without rewriting. The reason is that in the case of no rewriting, the default comparison is based on the hash value generated by the memory address. Different memory addresses generate different hash values ​​(without considering hash conflicts), so duplicate data is inserted.

2.2, only rewrite equals

import java.util.*;
 
public class Test {
    
    
    public static void main(String[] args) {
    
    
        Person p1 = new Person();
        p1.name = "张三";
 
        Person p2 = new Person();
        p2.name = "李四";
 
        Person p3 = new Person();
        p3.name = "张三";
 
 
        Set set = new HashSet();
        set.add(p1);
        set.add(p2);
        set.add(p3);
 
        for (Iterator iter=set.iterator(); iter.hasNext();) {
    
    
            Person p = (Person)iter.next();
            System.out.println("name=" + p.name );
        }
 
        System.out.println("p1.hashCode=" + p1.hashCode());
        System.out.println("p2.hashCode=" + p2.hashCode());
        System.out.println("p3.hashCode=" + p3.hashCode());
        System.out.println();
 
        System.out.println("p1 equals p2," + p1.equals(p2));
        System.out.println("p1 equals p3," + p1.equals(p3));
    }
 
}
class Person {
    
    
    String name;
 
    //覆盖 equals
    public boolean equals(Object obj) {
    
    
        if (this == obj) {
    
    
            return true;
        }
        if (obj instanceof Person) {
    
    
            Person p = (Person)obj;
            return this.name.equals(p.name);
        }
        return false;
    }
}

output:

img

equals same hashCode different

Although the above code can compare the same objects after rewriting equals, duplicate data is still inserted because the hashCode of the two objects is different, so the hashCode must be rewritten to avoid inserting duplicate data.

2.3, only rewrite hashCode

import java.util.*;
 
public class Test {
    
    
    public static void main(String[] args) {
    
    
        Person p1 = new Person();
        p1.name = "张三";
 
        Person p2 = new Person();
        p2.name = "李四";
 
        Person p3 = new Person();
        p3.name = "张三";
 
 
        Set set = new HashSet();
        set.add(p1);
        set.add(p2);
        set.add(p3);
 
        for (Iterator iter=set.iterator(); iter.hasNext();) {
    
    
            Person p = (Person)iter.next();
            System.out.println("name=" + p.name );
        }
 
        System.out.println("p1.hashCode=" + p1.hashCode());
        System.out.println("p2.hashCode=" + p2.hashCode());
        System.out.println("p3.hashCode=" + p3.hashCode());
        System.out.println();
 
        System.out.println("p1 equals p2," + p1.equals(p2));
        System.out.println("p1 equals p3," + p1.equals(p3));
    }
 
}
class Person {
    
    
    String name;
 
    //覆盖 hashCode
    public int hashCode() {
    
    
        return (name==null) ? 0:name.hashCode();
    }
}

output:

img

equals different hashCode same

It can be seen that the above code still inserts duplicate data, because when inserting data, the hashCode will be compared first, and if they are the same, then the equals will be compared. Only when the equals are the same, it will be considered duplicate, because the equals are different, so it is not considered the same object, again without avoiding repeated insertions.

2.4. Rewrite equals and hashCode at the same time

import java.util.*;
 
public class Test {
    
    
    public static void main(String[] args) {
    
    
        Person p1 = new Person();
        p1.name = "张三";
 
        Person p2 = new Person();
        p2.name = "李四";
 
        Person p3 = new Person();
        p3.name = "张三";
 
 
        Set set = new HashSet();
        set.add(p1);
        set.add(p2);
        set.add(p3);
 
        for (Iterator iter=set.iterator(); iter.hasNext();) {
    
    
            Person p = (Person)iter.next();
            System.out.println("name=" + p.name );
        }
 
        System.out.println("p1.hashCode=" + p1.hashCode());
        System.out.println("p2.hashCode=" + p2.hashCode());
        System.out.println("p3.hashCode=" + p3.hashCode());
        System.out.println();
 
        System.out.println("p1 equals p2," + p1.equals(p2));
        System.out.println("p1 equals p3," + p1.equals(p3));
    }
 
}
class Person {
    
    
    String name;
 
    //覆盖 hashCode
    public int hashCode() {
    
    
        return (name==null) ? 0:name.hashCode();
    }
 
    //覆盖 equals
    public boolean equals(Object obj) {
    
    
        if (this == obj) {
    
    
            return true;
        }
        if (obj instanceof Person) {
    
    
            Person p = (Person)obj;
            return this.name.equals(p.name);
        }
        return false;
    }
}

output:

img

Different equals and different hashCode

You're done. Only one Zhang San was inserted.

2.5. Summary

If the equals method and the hashCode method are overridden at the same time , the general contract of hashCode must be satisfied :

(1) During the execution of a Java application, when the hashCode method is called multiple times on the same object, the same integer must be returned consistently, provided that the information used to compare the objects with equals has not been modified. This integer need not be consistent from one execution of an application to another execution of the same application.

(2) If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

(3) If two objects are not equal according to the equals(java.lang.Object) method, then calling the hashCode method on either of the two objects is not required to produce different integer results. However, programmers should be aware that producing distinct integer results for unequal objects can improve hash table performance .

Therefore, when rewriting the method, the following conclusions are drawn:

​If two

(4) Do you have to rewrite hashCode when rewriting equals?

​ The answer is not necessarily, if you just rewrite equals just to compare whether two objects are equal, but if you use containers such as hashSet and hashMap, in order to avoid adding duplicate elements, you must rewrite both method.




In the process of learning, especially when learning collections, equalsand hashCodehas always been a frequent method, and in interview questions, there are often problems such as the difference between equals and ==. Now we will learn more about equalsthis hashCodefrom the bottom layer. Two ways.

1 Overview

First of all, we need to know that equalsand hashCodetwo methods belong to the method of the Object base class:

 public boolean equals(Object obj) {
    
    
    return (this == obj);
 }
 public native int hashCode();

From the source code, we can see that equalsthe method compares whether the references of two objects point to the same memory address by default. It hashCodeis a native local method (the so-called local method refers to a program written not in Java language, but in other languages, such as C/C++, generally for faster interaction with the machine), in fact, the default hashCodemethod What is returned is the memory address corresponding to the object (note that it is 默认). We can also understand this toStringindirectly through the method. We all know that toString returns "class name@hexadecimal memory address". From the source code, we can see that the memory address is the same as the hashCode()return value.

 public String toString() {
    
    
    return getClass().getName() + "@" + Integer.toHexString(hashCode());
 }

Interview question: hashCodeDoes the method return the memory address of the object? Answer: The method of the Object base class hashCodereturns the memory address of the object by default, but in some scenarios we need to rewrite hashCodethe function, for example, when we need to use Mapto store the object, after rewriting hashCodeit is not the memory address of the object.

2. Detailed explanation of equals

equalsThe method is a method of the base class Object, so all objects we create have this method and have the right to override this method. For example:

 String str1 = "abc";
 String str2 = "abc";
 str1.equals(str2);
 //结果为:true

Obviously Stringthe class must have rewritten equalsthe method, otherwise Stringthe memory addresses of the two objects must be different. Let's look at the method Stringof the class equals:

  public boolean equals(Object anObject) {
    
    
    //首先判断两个对象的内存地址(引用)是否相同
    if (this == anObject) {
    
    
        return true;
    }
    // 判断两个对象是否属于同一类型。
    if (anObject instanceof String) {
    
    
        String anotherString = (String)anObject;
        int n = value.length;
        //长度相同的情况下逐一比较 char 数组中的每个元素是否相同
        if (n == anotherString.value.length) {
    
    
            char v1[] = value;
            char v2[] = anotherString.value;
            int i = 0;
            while (n-- != 0) {
    
    
                if (v1[i] != v2[i])
                    return false;
                i++;
            }
            return true;
        }
    }
    return false;
 }

We can also see from the source code that equalsthe method is not just called this==objto determine whether the objects are the same. Virtually all of Java's existing reference data types override this method. When we define the reference data type ourselves, what principle should we follow to determine whether two objects are the same, which requires us to grasp it according to business needs. But we all need to follow these rules:

  • reflexive . x.equals(x)Must be true for any reference value x that is not null .
  • Symmetry . is also true for any non-null reference values ​​x and y if and only if x.equals(y)true .y.equals(x)
  • transitive . For any non-null reference values ​​x, y, and z, if x.equals(y)is true and y.equals(z)is true at the same time, then x.equals(z)it must be true.
  • Consistent . For any non-null reference values ​​x and y, if the object information used for equals comparison has not been modified, it will either x.equals(y)consistently return true or consistently return false when called multiple times.
  • x.equals(null)Returns false for any reference value x that is not null .

2.1 equals and ==

equals is often used to distinguish it from ==.

We all know that Java data types can be divided into basic data types and reference data types. There are eight basic data types byte, short, int , long , float , double , boolean ,char. For primitive data types, == compares their values.

For reference types, == compares the memory addresses of the objects they point to.

 int a = 10;
 int b = 10;
 float c = 10.0f;
 System.out.println("(a == b) = " + (a == b));//true
 System.out.println("(b == c) = " + (b == c));//true
 String s1 = "123";
 String s2 = "123";
 System.out.println(s1==s2);//true

The difference between equals and the == operator is summarized as follows:

  1. If both sides of == are basic data types, it is judged whether the values ​​​​of the operation data on the left and right sides are equal
  2. If both sides of == are reference data types, it is judged whether the memory addresses of the left and right operands are the same. If true is returned at this time, the operator must be operating on the same object.
  3. The equals of the Object base class compares the memory addresses of two objects by default. When the constructed object does not override the equals method, the result of comparison with the == operator is the same.
  4. equals is used to compare reference data types for equality. In the former system that satisfies the judgment rules of equals, as long as the specified attributes of two objects are the same, we consider the two objects to be the same.

Here is a classic interview question:

 String s1 = "abc";
 String s2 = "abc";
 System.out.println(s1==s2);//true
 System.out.println(s1.equals(s2));//true
 String s3 = new String("100");
 String s4 = new String("100");
 System.out.println(s3==s4);//false
 System.out.println(s3.equals(s4));//true

3. hashCode method

hashCodeThe method is not equalsused as frequently as the method. The hashCode method has to be combined with the Java Map container. Similar to HashMapthis container that uses the hash algorithm, it will hashCodepreliminarily determine the position of the object in the container according to the return value of the object, and then internally Then according to a certain hash algorithm to achieve access to elements.

3.1 Introduction to hash algorithm

The hash algorithm is also called a hash algorithm. Basically, the hash algorithm is to convert the key value of the object itself into the corresponding data storage address through specific mathematical function operations or other methods. The mathematical function used by the hash algorithm is called a "hash function", which can also be called a hash function.

Let's illustrate with an example:

{0,3,6,10,48,5}If we want to find the index of the value equal to 10 in the array storing the elements , we need to traverse the array to get the corresponding index. In this way, when the array is very large, it is relatively inefficient to traverse the array, which will greatly affect the efficiency of program execution.

If we can put elements according to certain rules when the array is stored, and when we want to find a certain element, we can quickly get the result we want according to the previously set rules. In other words, the order in which we store elements in the array may be in accordance with the order of addition, but if we operate according to a predetermined mathematical function to obtain the mapping relationship between the value of the element to be placed and the subscript of the array. Then when we want to take an element of a certain value, we can use the mapping relationship to quickly find the corresponding element.

Among the common hash functions, there is one of the simplest methods called "dividing the remainder method". The operation method is to divide the data to be stored by a constant and use the remainder as the index value. Let's see an example:

Store 323, 458, 25, 340, 28, 969, and 77 in an array with a length of 11 using the "division method". We assume that a certain constant mentioned above is the array length 11. The location where each number is divided by 11 is stored as shown in the figure below:

img

Just imagine that we now want to get the position of 77 in the array, is it all we need arr[77%11] = 77?

But the above-mentioned simple hash algorithm has obvious disadvantages. For example, the value obtained by taking the remainder of 11 from 77 and 88 is 0, but the data of 77 has been stored in the subscript 0, so 88 does not know where to go. up. The above phenomenon has a term in hashing called collision:

Collision: If two different data get the same result after being operated by the same hash function, then this phenomenon is called a collision.

So when designing the hash function, we should do as much as possible:

  1. reduce the likelihood of a collision
  2. Try to distribute the elements to be stored evenly in the designated container (we call them buckets) after the hash function operation results.

However, collisions are always unavoidable, so where hashCode is used, other methods need to be used to solve the collision problem.

3.2 The relationship between hashCode method and hash algorithm

A class with a hashCode method in Java contains a hash algorithm. For example, we can look at the hashCode algorithm provided to us by String:

  public int hashCode() {
    
    
    int h = hash;//默认是0
    if (h == 0 && value.length > 0) {
    
    
        char val[] = value;
         // 字符串转化的 char 数组中每一个元素都参与运算
        for (int i = 0; i < value.length; i++) {
    
    
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
 }

As mentioned earlier, the hashCode method is closely related to the collection class that uses hash tables in java. Let's take Set as an example. We all know that Duplicate elements are not allowed to be stored in Set. So how can we judge whether there are duplicate elements in the existing Set collection? Some people may say that we can judge whether two elements are the same through equals. Then the question comes again, if there are already 10,000 elements in the Set, wouldn't it be necessary to call the equals method 10,000 times after storing an element. Obviously this won't work, the efficiency is too low. So what can we do to ensure that it is efficient and not repetitive? The answer lies in the hashCode function.

After the previous analysis, we know that the hash algorithm uses a specific operation to obtain the storage location of the data, then the hashCode method acts as this specific function operation. Here we can simply think that the value obtained after calling the hashCode method is the storage location of the element (in fact, further calculations have been done inside the collection to ensure that the distribution is as uniform as possible, and different hash algorithms may be used in different classes).

When the Set needs to store an element, it will first call the hashCode method to check whether there is an element stored in the corresponding address. If not, it means that the Set must not have the same element, and it is good to store it directly in the corresponding location. But if the result of hashCode The same, that is, a collision occurs, then we further call the equals method of the element at this location to compare it with the element to be stored. If they are the same, they will not be stored. If they are not the same, other addresses need to be further hashed. In this way, we can ensure the method of no repeated elements as efficiently as possible.

Interview question: The function and significance of the hashCode method Answer: The existence of hashCode in Java is mainly used to improve the speed of container search and storage, such as HashSet, Hashtable, HashMap, etc. hashCode is used to determine objects in the hash storage structure of the storage address.

3.3 The relationship between hashCode and equals method

There is such a comment on the equals method of the Object class:

Note that when this method is overridden, it is usually necessary to override the {@code hashCode} method in order to maintain the general contract of the {@code hashCode} method which states that equal objects must have equal hashcodes.

It can be seen that if we rewrite the equals method for some reason, then we need to rewrite the hashCode method according to the agreement, and use equals to compare the same objects, and must have equal hash codes.

Object also has several requirements for the hashCode method:

  1. During the execution of a Java application, multiple calls to the hashCode method on the same object must consistently return the same integer, provided the information used to equals the objects has not been modified. This integer need not be consistent from one execution of an application to another execution of the same application.
  2. If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  1. If two objects are not equal according to the equals(java.lang.Object) method, then calling the hashCode method on either of the two objects is not required to produce different integer results. However, programmers should be aware that producing distinct integer results for unequal objects can improve hash table performance.

Combined with the equals method, we can make the following summary:

  1. Two objects for which a call to equals returns true must have equal hash codes.
  2. If the hashCode return values ​​of two objects are the same, calling their equals method does not necessarily return true.

Let's look at the first conclusion: two objects that call equals to return true must have equal hash codes. Why such a request? For example, let’s take the Set collection as an example. Set will first call the hashCode method of the object to find the storage location of the object. If two identical objects call the hashCode method to get different results, the result is that the same element is stored in the Set. And this result is definitely wrong. So two objects that call equals to return true must have equal hash codes .

So why the second item hashCodereturns the same value, but the two objects are not necessarily the same? This is because there is currently no perfect hash algorithm that can completely avoid "hash collisions". Since collisions cannot be completely avoided, two different objects may always get the same hash value. So we can only ensure that different objects are hashCodedifferent as much as possible. In fact, HashMapthis happens when storing key-value pairs. Before JDK 1.7, HashMapthe way to deal with key hash value collisions was to use the so-called ' zipper method '. The specific implementation will HashMapbe discussed later in the analysis.

Guess you like

Origin blog.csdn.net/qq_43842093/article/details/132529561