HashMap 1.8 部分源码解读

HashMap 1.8 相比于 1.7 及之前，将集合的结构变为：数组+链表+红黑树。引入红黑树自然是为了提高查询效率。这里摘录部分源码进行分析：

开头注释
几个类常量属性
构造方法
put, hash, get, remove方法
为什么capacity为2的次幂

1. 开头注释部分
我觉得还是可以读一下的，这一部分大致总结了一下HashMap

/**
 * Hash table based implementation of the <tt>Map</tt> interface.  This
 * implementation provides all of the optional map operations, and permits
 * <tt>null</tt> values and the <tt>null</tt> key.  (The <tt>HashMap</tt>
 * class is roughly equivalent to <tt>Hashtable</tt>, except that it is
 * unsynchronized and permits nulls.)  This class makes no guarantees as to
 * the order of the map; in particular, it does not guarantee that the order
 * will remain constant over time.
 *

这一段提出 HashMap 实现了 Map 接口，提供了所有可选择的map操作，
并且允许values（多个）值为null，允许存在一个为null的key，
其与Hashtable相似，差别于：

HashMap可以存放null值，而Hashtable不行；
同时，HashMap非线程同步，即非线程安全的，而Hashtable是线程安全的。

* <p>This implementation provides constant-time performance for the basic
* operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
* disperses the elements properly among the buckets.  Iteration over
* collection views requires time proportional to the "capacity" of the
* <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
* of key-value mappings).  Thus, it's very important not to set the initial
* capacity too high (or the load factor too low) if iteration performance is
* important.

当hash方法可以使使元素均匀分布在buckets（我认为指的是数组中的每一个位置）中时，HashMap的get和put操作，是常量时间的（时间复杂度O(1)）。
对于HashMap集合的遍历,需要的时间和buckets数量（数组大小，capacity）以及键值对数量之和成比例。
因此，如果对遍历的性能很看重的话，最好不要将初始的capacity设置的过大（或者将负载因子设置的过小）
负载因子用于决定何时需要对数组扩容，默认值为0.75

<p>An instance of <tt>HashMap</tt> has two parameters that affect its
* performance: <i>initial capacity</i> and <i>load factor</i>.  The
* <i>capacity</i> is the number of buckets in the hash table, and the initial
* capacity is simply the capacity at the time the hash table is created.  The
* <i>load factor</i> is a measure of how full the hash table is allowed to
* get before its capacity is automatically increased.  When the number of
* entries in the hash table exceeds the product of the load factor and the
* current capacity, the hash table is <i>rehashed</i> (that is, internal data
* structures are rebuilt) so that the hash table has approximately twice the
* number of buckets.

影响HashMap性能的两个实例变量分别为：initial capacity(初始容量) 以及 load factor(负载因子)
初始容量是指hash表创建时的容量。负载因子是指：当hash表的使用程度为多少时，对其容量进行扩充。
当entries(键值对）数量超过 load factor * capacity 时，对hash表进行扩容，也就是rehashed，将其扩容至二倍。

<p>As a general rule, the default load factor (.75) offers a good
* tradeoff between time and space costs.  Higher values decrease the
* space overhead but increase the lookup cost (reflected in most of
* the operations of the <tt>HashMap</tt> class, including
* <tt>get</tt> and <tt>put</tt>).  The expected number of entries in
* the map and its load factor should be taken into account when
* setting its initial capacity, so as to minimize the number of
* rehash operations.  If the initial capacity is greater than the
* maximum number of entries divided by the load factor, no rehash
* operations will ever occur.

load factor 设置为默认值0.75时对时间和空间消耗达到一个平衡。较大的load factor会减少空间消耗但是会增加查找消耗。
当设置初始capacity时需要考虑entries数量以及load factor，以最小化rehash操作的次数。
当初始 capacity > entries数量/load factor,扩容就不会发生。

<p>If many mappings are to be stored in a <tt>HashMap</tt>
* instance, creating it with a sufficiently large capacity will allow
* the mappings to be stored more efficiently than letting it perform
* automatic rehashing as needed to grow the table.  Note that using
* many keys with the same {@code hashCode()} is a sure way to slow
* down performance of any hash table. To ameliorate impact, when keys
* are {@link Comparable}, this class may use comparison order among
* keys to help break ties.

如果需要将很多键值对存进HashMap，建议创建一个capacity足够大的map,而不是让其通过rehash来扩容。

注意：使用很多具有相同hashCode值的keys必然会降低hash表的性能。当keys实现了Comparable接口时，可以使用keys之间的比较顺序来减小影响（这句不太明白）

/* <p><strong>Note that this implementation is not synchronized.</strong>
* If multiple threads access a hash map concurrently, and at least one of
* the threads modifies the map structurally, it <i>must</i> be
* synchronized externally.  (A structural modification is any operation
* that adds or deletes one or more mappings; merely changing the value
* associated with a key that an instance already contains is not a
* structural modification.)  This is typically accomplished by
* synchronizing on some object that naturally encapsulates the map.

*If no such object exists, the map should be "wrapped" using the
* {@link Collections#synchronizedMap Collections.synchronizedMap}
* method.  This is best done at creation time, to prevent accidental
* unsynchronized access to the map:<pre>
*   Map m = Collections.synchronizedMap(new HashMap(...));</pre>*/

HashMap是非线程同步的。若多个线程同时访问一个HashMap对象，且至少有一个线程修改了map结构（结构改变是指添加或删除一个或多个键值对，而不是仅仅改变了已经存在的key与value的关系），就必须要进行外部同步，一般通过对一些封装map的对象进行同步来实现(我认为就是使用一些线程安全之类的手段如Sychronized关键字之类的)
若没有这种对象存在，map应该使用Collections.sychronizedMap()方法进行包装。这一过程最好在创建时完成，以避免对map的突然非同步访问。使用示例：
Map m = Collections.synchronizedMap(new HashMap(…));

* <p>The iterators returned by all of this class's "collection view methods"
* are <i>fail-fast</i>: if the map is structurally modified at any time after
* the iterator is created, in any way except through the iterator's own
* <tt>remove</tt> method, the iterator will throw a
* {@link ConcurrentModificationException}.  Thus, in the face of concurrent
* modification, the iterator fails quickly and cleanly, rather than risking
* arbitrary, non-deterministic behavior at an undetermined time in the
* future.

HashMap的迭代器是具有 fail-fast 特性的：若迭代器创建后的任意时刻，map结构被修改，除了迭代器自身的remove方法，均会抛出ConcurrentModificationException异常。因此，在面临并发修改时，iterator会立马失败，以避免冒任何风险。

<p>Note that the fail-fast behavior of an iterator cannot be guaranteed
* as it is, generally speaking, impossible to make any hard guarantees in the
* presence of unsynchronized concurrent modification.  Fail-fast iterators
* throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
* Therefore, it would be wrong to write a program that depended on this
* exception for its correctness: <i>the fail-fast behavior of iterators
* should be used only to detect bugs.</i>

注：iterator的fail-fast 行为是不能被保证的，因为，通常，在发生非同步并发修改时做出任何硬性保障是不可能的。Fail-fast 迭代器只能尽最大的努力抛出ConcurrentModificationException异常。
因此，依赖该异常编写程序确定其正确性是不对的：fail-fast行为应被用于检测bugs。

下面开始代码部分
2. 几个类常量属性：
在这里插入图片描述
DEFAULT_INITIAL_CAPACITY: 默认初始容量，值为2的4次方，必须为2的幂次方（最后解释）。

MAXIMUM_CAPACITY：最大容量， 2的30次方，同时也要求是2的次幂。

在这里插入图片描述
DEFAULT_LOAD_FACTOR：默认负载因子，为0.75

在这里插入图片描述
TREEIFY_THRESHOLD：将链表转化为树的阈值，当链表的长度达到8时转换为红黑树。

在这里插入图片描述

UNTREEIFY_THRESHOLD：当树中节点数小于等于6时，将其转换为链表。（之前看到有人说这个和TREEIFY_THRESHOLD之间差2的原因是：防止树形化和链表化的操作太频繁，我觉有点道理）
在这里插入图片描述
MIN_TREEIFY_CAPACITY：capacity 大于等于这个值（64）时，才可以进行树形化，小于该值时，可能出现一个bucket 里的节点过多，此时应该将capacity重新设置，而不是进行树形化。

在这里插入图片描述
一个静态内部类，用于链表及红黑树。

3. HashMap的构造方法：有4个

含有初始容量initialCapacity和负载因子loadFactor两个参数的构造方法，这里就不解释代码了
只有初始容量一个参数的构造方法，这里调用了上一个构造方法，负载因子设置为默认值0.75
无参构造方法（常用）
含有一个Map类型参数的构造方法，将一个map复制给新的hash map。

4. put, hash, get, remove 方法
put()方法：向HashMap对象中添加键值对，可以看到其调用了hash()方法计算key的hash值，并调用了putVal()方法。
hash()方法，将key.hashCode(）结果的高16位及低16为进行了异或。
注释中给出了原因，最后使用hash值确定该key的key-value对应的数组位置时，会使用 hash&(n-1)。如果直接使用两个key的hashCode()进行操作，很容易发生hash冲突，将高低位异或会减少这种事情的发生。
如：假设两个key: key1，key2，其hashCode()值如下

虽然 key1 != kye2, 但是 key1& (n-1) == key2 & (n-1)

但是hash操作后：hash(key1)变为：

hash(key2):

hash(key1) & (n-1):

hash(key2) & (n-1):

明显结果不同了，这样就减少了hash冲突。

/**
 * Implements Map.put and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p；
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

put大致步骤如下：

首先需要确保数组是存在的
之后，确定数组索引hash & (n-1)位置上是否为null，即判断是否要放入的key的位置上有hash冲突。
	若数组该位置为空，生成一个含有key-value信息的Node，放入数组该位置。
	不为空：
		若该头节点key与要插入的key相同，则不用新生成Node，直接对该节点进行修改即可
		若不同：
			若头节点为树节点，则调用putTreeVal方法进行操作
			若头节点为链表节点，遍历链表；
				若链表中无该key,则生成一个Node
				若有，直接对该节点进行修改即可
完成上述操作后：
++modCount(修改次数)
同时需要判断是否需要扩容。

get方法：

其调用getNode()方法：

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

get 方法大致步骤：

需要确定数组存在，不为空且key对应的索引位中不为空
	若数组中的头节点Node的key与key相同，直接返回该节点即可
	头节点key与key不同：
		若头节点为树节点，则调用getTreeNode方法，在树中查询
		头节点为链表节点，遍历链表,存在key的节点则返回该节点
完成上述操作，若没有key的节点，则返回null.

remove 方法

调用了removeNode方法

final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) != null) {
        Node<K,V> node = null, e; K k; V v;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        else if ((e = p.next) != null) {
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}

removeNode方法步骤：

确定数组存在不为空且(hash & n-1)处有Node：
首先要找到key的节点：
	若头节点key为key：node = p
	否则：
		若头节点为树节点：调用getTreeNode方法
		头节点为链表节点：遍历链表找key的Node
找到节点，若matchValue为true需要确定value值也相同(但是matchValue默认值为false，因此这里不用考虑value值)
	若节点为树节点：调用removeTreeNode方法
	为头节点：删除头节点
	为链表节点：删除该节点

最后说一下为什么capacity,即数组容量为2的次幂？
原因很简单：保证hash & (capacity-1)可以落在 0~capacity-1上。

HashMap 1.8 部分源码解读

猜你喜欢