Java之 equals() 和 hashCode() 之 HashMap

一、什么是 equals() ?

首先看 java.lang.Object.equals() 的实现：

public boolean equals(Object obj) {
    return (this == obj);
}

/*

The equals method for class Object implements the most discriminating possible 
equivalence relation on objects; that is, for any non-null reference values x 
and y, this method returns true if and only if x and y refer to the same object 
(x == y has the value true).

Note that it is generally necessary to override the hashCode method whenever 
this method is overridden, so as to maintain the general contract for the 
hashCode method, which states that equal objects must have equal hash codes.

*/

解释：

在 Object 类的 equals() 实现中，只有两个对象的引用指向同一个地址时，才是 true
（== 比较的是地址），即：两个对象的引用指向同一个对象。

通常重写 equals() 方法时，要重写 hashCode() 方法，以表示：如果两个对象是 equals
的，那么其 hashCode() 值也应该一样。

另外：

1、如果是基础数据类型，没有 equals() 方法。比较基础数据类型，只能用 ==
2、String 类的 equals() 比较的是值，而不是地址。因为 String 类重写了 equals()

二、什么是 hashCode()？ hashCode() 用途？

首先看 java.lang.Object.hashCode() 的实现：


public native int hashCode();

/*

Returns a hash code value for the object. This method is supported for the benefit
of hashtables such as those provided by java.util.Hashtable.

The general contract of hashCode is:

1. Whenever it is invoked on the same object more than once during an execution 
   of a Java application, the hashCode method must consistently return the same 
   integer, provided no information used in equals comparisons on the object is 
   modified. This integer need not remain consistent from one execution of an 
   application to another execution of the same application.

2. If two objects are equal according to the equals(Object) method, then calling 
   the hashCode method on each of the two objects must produce the same integer 
   result.

3. It is not required that if two objects are unequal according to the 
   equals(java.lang.Object) method, then calling the hashCode method on each of 
   the two objects must produce distinct integer results. However, the programmer
   should be aware that producing distinct integer results for unequal objects may
   improve the performance of hashtables.


As much as is reasonably practical, the hashCode method defined by class Object 
does return distinct integers for distinct objects. (This is typically implemented
by converting the internal address of the object into an integer, but this 
implementation technique is not required by the JavaTM programming language.)


*/

解释：

hashCode 用来支持 java.util.Collection 类中的 Hash合集类。例如：HashTable,HashMap,HashSet

要求：

1. 如果两个对象（的引用）是 equals 的，那么其 hashCode 值必须一样。
2. 如果两个对象（的引用）非 equals 的，那么其 hashCode 可以一样。

Object 类中的 hashCode()：

基于上述的要求，该方法的实现是采用把对象的内存地址用 hash 算法得到一个 int 类型的
数。这样的确做到了使不同的对象具有不同的 hash 值。

另外：

从 Java 语法角度来说，hashCode() 和 equals() 是两个独立的，互不隶属，互不依赖的方法。
“equals 成立”与 “hashCode 相等” 这两个命题之间，谁也不是谁的充分条件或者必要条件。

三·一、什么是 Hash 算法？

Hash 算法又叫散列算法，由 Hash（人名）首先提出。

Hash 算法：把任意长度的输入（又叫做预映射， pre-image），通过散列算法，变换成固定长度的输出，该输出就是散列值。

这种转换是一种压缩映射，也就是，散列值的空间通常远小于输入的空间。不同的输入可能会散列成相同的输出。

三·二、为什么要用 hash 算法？

Hash 算法的输出值是固定长度的，该值可以被用来作为内存地址。
这样取的时候，就可以通过计算其 hash 值来获取所在内存存储单元的地址。
而快速的读取其存储的内容。

四、HashMap

向 HashMap 中存值：

HashMap 首先初始化一块内存（数组）。把内存划分为一个一个的存储空间（bucket）。
每一个存储空间都有一个地址与其对应。但是，一个存储空间，不限于只存一对键值对（HashMap 使用 LinkedList 存储多个具有相同 hashCode 的键值对。新加的放在最前）。

当向其存值时，通过调用 key 对象（不能是基础数据类型，基础数据类型没有 hashCode() 方法）的 hashCode() 方法，得到一个内存的地址，然后将其存入。

如果 key 对象的 hashCode() 与已存在（遍历所有）对象的 hashCode() 值相同，而且 equals() 返回值为 true，则覆盖 value

从 HashMap 中取值：

使用一个object作为 key 来拿 HashMap 中对应的 value ， HashMap 的工作方法是，
通过传入的 object 的 hashcode() 在内存中找地址，
当找到这个地址后再通过 equals() 来比较传入的object和地址中的object(s)（可能是多个），结果为 true 就取出 value。

-----------------------------------------------------------------

HashMap 实现了 Map 接口，允许使用 null 值和 null 键，不保证映射顺序。

HashMap 有两个参数影响性能：

初始容量：表示哈希表在其容量自动增加之前可以达到多满的一种尺度
负载因子：当哈希表中的条目超过了容量和负载因子的乘积的时候，
就会进行“重哈希”操作（扩容并从新计算元素的hash存储位置的索引）。

源码：

//

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;  // 初始容量

static final float DEFAULT_LOAD_FACTOR = 0.75f;      // 负载因子

//

可以看到：

默认初始容量为 1 << 4，也就是 16（即：初始为16个 Buckets ）。

默认负载因子为 0.75。

     负载因子是衡量一个散列表的空间使用程度。
     负载因子越大，空间使用程度越高。

     对于使用链表法的散列表来说（散列表的存储单位是链表），
     查找一个元素的时间负载度为 O(1+a)，1是hash地址映射，a 是链表长度。
     负载因子越大，对空间的利用越充分，而查找效率降低。
     负载因子越小，散列表的数据过于稀疏，造成空间浪费。

--

当HashMap中的元素越来越多时，hashCode 相同的几率就越来越高，因为数组的长度固定。
为了提高查询效率，就要对数组进行扩容，并从新计算每个元素在扩容后的数组中的索引。
这是非常消耗性能的事。所以如果已知容量的 HashMap 可以提高性能。

--

Bucket 中可以存放 hashCode() 不同的对象：
调用 hashCode() 得到的值，并不直接用来做存放数组的索引。因为这个值可能很大。
而是使用了对容量求余的算法，得到这个对象应该存在数组位置的索引：

static int indexFor(int h, int length) {
        return h & (length-1);
}

因为HashMap的容量要求是2的幂次方。所以 h & (length-1) 等价于 h % length
但效率更高。

--

Fail-fast 机制

HashMap 是线程不安全的。在遍历时，为了能够确保遍历的正确和全面性，如果
其它线程修改了 HashMap 中的对象，则遍历会立即失败，并抛出：
ConcurrentModificationException ，而不必等遍历结束后才有所反应。

这就是 Fail-fast 机制。

不要根据此异常来判断或进行捕获而做其它的操作。此异常仅用于提示线程不安全之用。

___________________________________________________________________________

引用：
http://blog.sina.com.cn/s/blog_5ea3ea4a0100butt.html

http://stackoverflow.com/a/6493946/2893073

http://www.cnblogs.com/zhousysu/p/5483932.html

HashMap 的实现原理
http://www.360doc.com/content/10/1214/22/573136_78200435.shtml

_______________________________________________________________________________

HashMap之系列文章（一）：
Java之 equals() 和 hashCode() 之 HashMap

HashMap之系列文章（二）：
Java之HashMap深度学习

HashMap之系列文章（三）：
数据库之索引（Index）

HashMap之系列文章（四）：
Java之 HashMap VS. HashTable 区别

-

Java之 equals() 和 hashCode() 之 HashMap

猜你喜欢