HashMap 实现方法及源码解析

前言

今天去面试一家比较大的公司。整个过程分为了笔试、技术面、领导面。这次面试，总体给我的感觉不好！这次不是我自己发挥不好，是真的发觉自己的基础比较差。比如自己的简历上说熟悉数据结构与算法。可是当面试官问到我 HashMap 的实现原理及细节，我就结结巴巴，只是很隐含糊地答出了“估计是由一个桶之类的，当添加的时候，会计算相应的 key 的 hashCode，然后装进其中的桶中。而桶是由链表实现的。”

很累。是亲人推荐进去的，表现还这么差！哎！只有继续加油，努力！

HashMap 的概述

根据百度的定义，HashMap 是基于哈希表的 Map 接口实现的。此实现提供所有可选的映射操作。并允许使用 null 值和 null 键。（面试常常有人拿 HashTable 和 HashMap 进行比对，其实除了 HashTable 是线程安全的和不允许使用 null 之外，两者差不多。）。还需要注意一个问题，此类不保证映射的顺序，特别是它不保证该顺序恒久不变的。

HashMap 的实现

今天面试官问我 HashMap 是怎么实现的。我其实有看，但是实在是紧张忘掉了。。。尴尬~

HashMap 在 Java7 由数组和链表来实现的。如果熟悉这两种数组结构的同学，肯定知道数组和链表两者的优缺点。而 HashMap 在底层维护着一个数组。数组的每一项都是一个 Node。

 /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    transient Node<K,V>[] table; // Node 节点数组

你会纳闷，不是说 HashMap 是由数组和链表组成的吗？不急，我们先看 Node 的数据结构先。

    /**
     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
     */
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;   // hash 码
        final K key;      // 键
        V value;          // 值
        Node<K,V> next;   // 指向下一个
        
}

我们会发现， Node 中有一个 next 成员变量。相信同学已经明白了，HashMap 的数组里面是 Node，但是 Node 节点会使用 next 来指向下一个对象。这就形成了链表。

put 的实现

    /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

注意到 put 的注解，可以知道 HashMap 是不可以重复的。在 put 方法中，调用了一个 putVal 方法，需要传入参数 key 的 hash 值，key ，value 等。后面两个布尔值是代表着 “只有不存在才插入”“是否执行回调函数”。

putVal 的实现

    /**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;    //如果不存在就
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);     // 如果数组上的位置空缺，则新建一个 Node 
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)        //如果是二叉树，就根据二叉树插入的方法插入
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {    
                    if ((e = p.next) == null) {        //如果为空，就添加节点
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st  超过 Treeify_THRESHOLD 就转为树
                            treeifyBin(tab, hash);        // 这个方法就是将链表转为红黑树
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // 如果 key 已经存在了，且 onlyIfAbsent 为 false 或者 value 原来就等于 null，则替换掉
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;  // 记录修改的条数如果超过了原本大小就重新调整（大小取决于负载因子）
        if (++size > threshold)    
            resize();
        afterNodeInsertion(evict);
        return null;
    }

根据源码，我们又可以知道。原来我们不止数组和链表，我们还有红黑树！具体为什么这么做，那么请看下面的关于链表过长的问题。

有一点需要注意的，每当我们添加完后，返回的值的旧值，HashMap 有这种操作~

remove 操作的实现

移除的操作其实跟添加差不多。主要是，传入相关的 hash 、键、还有两个布尔值 matchValue 和 movable。当 matchValue 为 true，“只有值相等的时候才移除”。而当 movable 为 false 并且数据结构是红黑树的话，那么该节点的移除不会影响到别的节点。实现代码如：

     * @param movable if false do not move other nodes while removing  
     * @return the node, or null if none
     */
    final Node<K,V> removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable) {
         ...
         if (node instanceof TreeNode)        // 当为红黑树的时候，才用上 movable
            ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
          ...
}

get 操作的实现

    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

    /**
     * Implements Map.get and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @return the node, or null if none
     */
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

get 方法实现也很简单。主要是根据 hash 和键来找到相应的桶，然后使用 equals 方法进行判断。如果没有则返回 null。

关于链表过长问题

简单来说，Java 7 中HashMap 是每个桶放置的是链表，这样当 hash 碰撞严重的时候，会导致个别桶的链表过长，从而影响性能。（像查询操作，时间复杂度变成了 O（n））。

其实 Java 8 的 HashMap 有相关的措施。就是将链表转成了二叉树。（这样的好处之一就是将 O(n) 转变为 O(logn)）。

哈希冲突

我们在进行添加或者其他操作的时候，通常都需要计算 hash 值。举个例子，如果插入的元素中，太多 hash 值相同的时候，就造成了哈希冲突。Java 7 是通过链表来解决，如果相同，则添加到链表；而 Java 8 是通过链表或者红黑树来解决。

并发问题

前面说过，HashMap 是线程不安全的。如果你需要一个线程安全的，可以推荐你 ConcurrentHashMap。但是如果想用 HashMap 又想线程安全，那该怎么办呢？

下面我从网上上参考了几种方法

//Hashtable
Map<String, String> hashtable = new Hashtable<>();
 
//synchronizedMap
Map<String, String> synchronizedHashMap = Collections.synchronizedMap(new HashMap<String, String>());
 
//ConcurrentHashMap
Map<String, String> concurrentHashMap = new ConcurrentHashMap<>();

我们分析一下上面3种方法

HashTable：你看下源码可以发现，HashTable 在 get、put 等方法都使用了 synchronized 字段。意味着当一个写线程拿了对象锁，那么其他的读线程都不能对这个 HashTable 进行操作了。效率低下。

synchronizedMap：

   public static <K,V> Map<K,V> synchronizedMap(Map<K,V> m) {
        return new SynchronizedMap<>(m);
    }

    /**
     * @serial include
     */
    private static class SynchronizedMap<K,V>
        implements Map<K,V>, Serializable {
        private static final long serialVersionUID = 1978198479659022715L;

        private final Map<K,V> m;     // Backing Map
        final Object      mutex;        // Object on which to synchronize

        SynchronizedMap(Map<K,V> m) {
            this.m = Objects.requireNonNull(m);
            mutex = this;
        }

        SynchronizedMap(Map<K,V> m, Object mutex) {
            this.m = m;
            this.mutex = mutex;
        }

        public int size() {
            synchronized (mutex) {return m.size();}
        }
        public boolean isEmpty() {
            synchronized (mutex) {return m.isEmpty();}
        }
        public boolean containsKey(Object key) {
            synchronized (mutex) {return m.containsKey(key);}
        }
        public boolean containsValue(Object value) {
            synchronized (mutex) {return m.containsValue(value);}
        }
        public V get(Object key) {
            synchronized (mutex) {return m.get(key);}
        }

        public V put(K key, V value) {
            synchronized (mutex) {return m.put(key, value);}
        }
        public V remove(Object key) {
            synchronized (mutex) {return m.remove(key);}
        }
        public void putAll(Map<? extends K, ? extends V> map) {
            synchronized (mutex) {m.putAll(map);}
        }
        public void clear() {
            synchronized (mutex) {m.clear();}
        }
}

从源码中可以看出调用synchronizedMap()方法后会返回一个SynchronizedMap类的对象，而在SynchronizedMap类中使用了synchronized同步关键字来保证对Map的操作是线程安全的。