详细介绍hashmap底层原理

 hello,大家好,这里是可傥。好久不见,之前一段时间因为在忙一些事情,然后好久没有更新了,今天继续更新。闲话不多说,今天,我们讲一下java中hashmap的底层原理和实现。
复制代码

概述

HashMap是基于哈希表实现的,每一个元素是一个键值对允许使用null值和null键,属于非线性安全的无序集合, 多线程环境可以采用并发包下的concurrentHashMap。 HashMap 实现了Serializable接口,因此它支持序列化,实现了Cloneable接口,所以可以被克隆。

数据结构

相信大家都听过这么一句话,叫做,jdk1.7之前,hashmap底层是数组+链表的形式组成,jdk1.8之后,hashmap底层是数组+链表+红黑树组成。 但是如果看过java底层,你会发现,官方说明了,其实转化为红黑树的概率,也只是不到一千万分之一。下面为官方说明:

 /* factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million 超过8的概率不到1千万分之1
     * */
复制代码

超过8才会转化为红黑树(且数组大于64)。那么接下来,我将结合画图以及源码的·方式将数据结构展开: 如下图: 在这里插入图片描述 图中,jdk1.8之后的数据就是很清晰的表达了底层数据结构。接下来,把hashmap的参数展示出来就可以更直观的说明问题,可傥把源码和注释全部放出来了,有一定阅读能力的可以看英文,或者就直接看可傥的汉字注释,如下代码:


    /**
     * The default initial capacity - MUST be a power of two.
     * 默认容量,默认为16,必须为2的倍数
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     * 最大容量,为2的30次方
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
     * 装载因子//后续会提到为什么是0.75
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     * 链表转化为红黑树树的阈值为8
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     * 红黑树重新转化为链表的阈值为6
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     * 转红黑树时, table的最小长度
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

    /**
     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
     * 链表结点,用来存放键值对,继承自Entey
     */
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }
		//... 省略部分代码
    }
    
    /**
     * Entry for Tree bins. Extends LinkedHashMap.Entry (which in turn
     * extends Node) so can be used as extension of either regular or
     * linked node.
     * 红黑树结点,用来存放红黑树的键值对
     */

    static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;
        TreeNode(int hash, K key, V val, Node<K,V> next) {
            super(hash, key, val, next);
        }

        /**
         * Returns root of tree containing this node.
         */
        final TreeNode<K,V> root() {
            for (TreeNode<K,V> r = this, p;;) {
                if ((p = r.parent) == null)
                    return r;
                r = p;
            }
        }
    
    /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     * 存储数据的数组
     */
    transient Node<K,V>[] table;

    /**
     * Holds cached entrySet(). Note that AbstractMap fields are used
     * for keySet() and values().
     * 遍历的映射
     */
    transient Set<Map.Entry<K,V>> entrySet;

    /**
     * The number of key-value mappings contained in this map.
     * 键值对的数量
     */
    transient int size;

    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     * 结构性变更的次数
     * 记录的是映射数量的修改
     */
    transient int modCount;

    /**
     * The next size value at which to resize (capacity * load factor).
     *
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    //下次扩容操作的大小
    int threshold;

    /**
     * The load factor for the hash table.
     *
     * @serial
     * 扩容阈值 为 size * DEFAULT_LOAD_FACTOR
     */
    final float loadFactor;

复制代码

结合画图和源代码,参数具体含义已经在注释上说明,接下来,将会具体讲解数据结构。 hashmap在一开始并没有初始化,而是在put(K,V)第一个元素的时候,初始化一个大小为16的数组,然后根据哈希计算存放存入的值在初始化数组的位置,当存入的元素个数超过扩容阈值的时候,数据会进行扩容,之前的数组个数 *2,扩容阈值和size都会变更。当数组大于64,且某个数组下的链表个数超过8的时候,该数组下的数据将会转化为红黑树。而扩容后若该数组下的链表个数小于6的时候,又会重新从红黑树转为链表。这就是hashmap的数据结构。

HashMap元素存储问题

那么,元素在存放的时候,是如何确定存放在哪个数组上的呢?咱们结合源码进行分析。


    /**
     * Computes key.hashCode() and spreads (XORs) higher bits of hash
     * to lower.  Because the table uses power-of-two masking, sets of
     * hashes that vary only in bits above the current mask will
     * always collide. (Among known examples are sets of Float keys
     * holding consecutive whole numbers in small tables.)  So we
     * apply a transform that spreads the impact of higher bits
     * downward. There is a tradeoff between speed, utility, and
     * quality of bit-spreading. Because many common sets of hashes
     * are already reasonably distributed (so don't benefit from
     * spreading), and because we use trees to handle large sets of
     * collisions in bins, we just XOR some shifted bits in the
     * cheapest possible way to reduce systematic lossage, as well as
     * to incorporate impact of the highest bits that would otherwise
     * never be used in index calculations because of table bounds.
     * 
     */
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
复制代码

在这里插入图片描述

从源码中可得知,通过hashCode()的高16位异或低16位实现获取hash值:(h = k.hashCode()) ^ (h >>> 16),然后通过与数组容量(n-1)进行与操作获取数组下标。如下图: 在这里插入图片描述 存放的时候,put中调用了putVal()方法,结合源码进行分析: 在这里插入图片描述

HashMap的put方法执行过程可以通过下图来理解 在这里插入图片描述

扩容机制

扩容还是结合源代码来讲会更加清晰。

在这里插入图片描述

初始化为16的数组,然后在元素填充到装载因子 * 数组容量的时候,就会触发扩容。第一次扩容为 16 * 0.75 =12 ,每次扩容都为2的倍数,结合hash计算数组位置就可以知道,为啥需要2的倍数。而扩容之后,与操作向左在进行一位计算,所以每次扩容,原数组上的成员都是要么还在原数组,要么就在原数组的下标加上n的位置,第一次为加16的位置,第二次扩容为加32的位置,以此类推。而源码中也可以看到,不需要重新去计算hash值,直接拿原来的hash值就可以确定接下来的位置。

装载因子默认为什么为0.75

加载因子过高,虽然减少了空间开销,提高了空间利用率,但同时也增加了查询时间成本; 加载因子过低,虽然可以减少查询时间成本,但是空间利用率很低,同时提高了rehash操作的次数。 默认0.75是提高空间利用率和减少查询成本的折中,主要是泊松分布,0.75的话碰撞最小。

hashmap先讲到这里,后续可能会将内部的方法都讲一遍,这里是可傥,会分享自己的所学和所得,大家晚 安csdn地址为:blog.csdn.net/kaneandblan…

猜你喜欢

转载自juejin.im/post/7017842485502148621