HashMap深入学习

一、简单介绍

HashMap是一种比较常用的容器，其结构是数组加链表。外层数组，每个数组元素是链表。用链表解决hash冲突。结合了数组易寻址及链表易插入删除的优点。

二、HahMap的结构

1、主要属性：

    /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 16;

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The table, resized as necessary. Length MUST Always be a power of two.
     */
    transient Entry[] table;

    /**
     * The number of key-value mappings contained in this map.
     */
    transient int size;

    /**
     * The next size value at which to resize (capacity * load factor).
     * @serial
     */
    int threshold;

    /**
     * The load factor for the hash table.
     *
     * @serial
     */
    final float loadFactor;

    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient volatile int modCount;

分类	属性名	详细
类变量（default static final）	DEFAULT_INITIAL_CAPACITY	默认初始容量，初始化不设置初始值以16为初始值，初始值必须是2的幂值。（table数组的size）
类变量（default static final）	MAXIMUM_CAPACITY	最大容量，1>>>30(table数组size)
类变量（default static final）	DEFAULT_LOAD_FACTOR	默认比例因子 0.75，当初始化没有传值时使用。（怎么计算的待研究）
实例变量（default transient Entry[]）	table	数组，用来存放hashMap单元数据，每个数组存一个链表结构。数组大小为2的幂值，大小不够可以resize
实例变量（default transient int）	size	HashMap数据总数
实例变量（default）	threshold	table resize 的阈值=capacity * load factor
实例变量（final float）	loadFactor	加载因子 size/capacity, 默认为0.75
实例变量（transient volatile int）	modCount	有新增entry就会加1,remove也加1。用来做简单的并发控制。

2、主要方法实现

1、初始化

    /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        // Find a power of 2 >= initialCapacity
        int capacity = 1;
        while (capacity < initialCapacity)
            capacity <<= 1;

        this.loadFactor = loadFactor;
        threshold = (int)(capacity * loadFactor);
        table = new Entry[capacity];
        init();
    }

入参：（1）initialCapacity 初始容量，指外层数组的初始大小。（2）loadFactor 加载因子，当 size/capacity>loadFactor 需要resize扩大数组大小，默认为0.75。注：数组容量为大于传入初始容量的最小2的幂值。 2、hash方法

    /**
     * Applies a supplemental hash function to a given hashCode, which
     * defends against poor quality hash functions.  This is critical
     * because HashMap uses power-of-two length hash tables, that
     * otherwise encounter collisions for hashCodes that do not differ
     * in lower bits. Note: Null keys always map to hash 0, thus index 0.
     */
    static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

入参是key值的hashcode，目的是让hash表更松散，具体原理待研究。因为hashmap表长度是2的幂，经过indexfor的计算，就是取hashcode的末几位，hash方法的作用，是让末几位受到其他位差异的影响更多，让hashmap更松散。

3、indexfor方法

    /**
     * Returns index for hash code h.
     */
    static int indexFor(int h, int length) {
        return h & (length-1);
    }

因为hashmap表长度是2的幂，经过indexfor的计算，就是取hashcode的末几位

4、get方法

    /**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
     * key.equals(k))}, then this method returns {@code v}; otherwise
     * it returns {@code null}.  (There can be at most one such mapping.)
     *
     * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     * @see #put(Object, Object)
     */
    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        int hash = hash(key.hashCode());
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }

HashMap允许key为null，key为null放在数组第一位。

获取方法就是，先根据key算出数组下标，取到链表，再遍历链表获取到key相同的值，获取其value值。

遍历的时候，会先比较hash值，我理解这是为了提高性能，因为遍历的时候大多数情况应该是不等的，而大多数的不等通过hash值得比较就能过滤掉。不用通过equals比较，因为hash值是算好的，equals还得再算一遍，有些复杂的可能性能消耗不少。

5、put方法

    /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key.hashCode());
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

根据key算出table下标；取到对应链表，遍历链表，获取key值相同的entry更新value值返回oldValue；如果没有新增entry，返回null。 6、addEntry 方法

    /**
     * Adds a new entry with the specified key, value and hash code to
     * the specified bucket.  It is the responsibility of this
     * method to resize the table if appropriate.
     *
     * Subclass overrides this to alter the behavior of put method.
     */
    void addEntry(int hash, K key, V value, int bucketIndex) {
	Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
        if (size++ >= threshold)
            resize(2 * table.length);
    }

根据指定的hash值、key、value、bucketIndex（table下标）新增entry；如果大小超过阈值（capacity*loadFactor），table阔成原来的两倍。 7、resize方法

    /**
     * Rehashes the contents of this map into a new array with a
     * larger capacity.  This method is called automatically when the
     * number of keys in this map reaches its threshold.
     *
     * If current capacity is MAXIMUM_CAPACITY, this method does not
     * resize the map, but sets threshold to Integer.MAX_VALUE.
     * This has the effect of preventing future calls.
     *
     * @param newCapacity the new capacity, MUST be a power of two;
     *        must be greater than current capacity unless current
     *        capacity is MAXIMUM_CAPACITY (in which case value
     *        is irrelevant).
     */
    void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity];
        transfer(newTable);
        table = newTable;
        threshold = (int)(newCapacity * loadFactor);
    }

如果旧表容量已经到MAXIMUM_CAPACITY， threshold设置成最大int值，table不做扩大。否则根据传入新capacity，创建新table，将旧table数据迁移到新table，重新计算threshold（阈值）。

8、transfer方法，将oldTable数据迁移到newTable

    /**
     * Transfers all entries from current table to newTable.
     */
    void transfer(Entry[] newTable) {
        Entry[] src = table;
        int newCapacity = newTable.length;
        for (int j = 0; j < src.length; j++) {
            Entry<K,V> e = src[j];
            if (e != null) {
                //oldtable数据设置为null,我理解能够快速垃圾回收。
                src[j] = null;
                do {
                    Entry<K,V> next = e.next;
                    int i = indexFor(e.hash, newCapacity);
                    e.next = newTable[i];
                    newTable[i] = e;
                    e = next;
                } while (e != null);
            }
        }
    }

遍历旧table的entry，重新计算index，存入到new table。

9、HashIterator内部类，HashMap迭代器基类。

private abstract class HashIterator<E> implements Iterator<E> {
        Entry<K,V> next;	// next entry to return
        int expectedModCount;	// For fast-fail
        int index;		// current slot
        Entry<K,V> current;	// current entry

        HashIterator() {
            expectedModCount = modCount;
            if (size > 0) { // advance to first entry
                Entry[] t = table;
                while (index < t.length && (next = t[index++]) == null)
                    ;
            }
        }

        public final boolean hasNext() {
            return next != null;
        }

        final Entry<K,V> nextEntry() {
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            Entry<K,V> e = next;
            if (e == null)
                throw new NoSuchElementException();

            if ((next = e.next) == null) {
                Entry[] t = table;
                while (index < t.length && (next = t[index++]) == null)
                    ;
            }
	    current = e;
            return e;
        }
        
        //可以用这个方法一边遍历一边删除。
        public void remove() {
            if (current == null)
                throw new IllegalStateException();
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            Object k = current.key;
            current = null;
            HashMap.this.removeEntryForKey(k);
            expectedModCount = modCount;
        }

    }

    private final class ValueIterator extends HashIterator<V> {
        public V next() {
            return nextEntry().value;
        }
    }

    private final class KeyIterator extends HashIterator<K> {
        public K next() {
            return nextEntry().getKey();
        }
    }

    private final class EntryIterator extends HashIterator<Map.Entry<K,V>> {
        public Map.Entry<K,V> next() {
            return nextEntry();
        }
    }

    // Subclass overrides these to alter behavior of views' iterator() method
    Iterator<K> newKeyIterator()   {
        return new KeyIterator();
    }
    Iterator<V> newValueIterator()   {
        return new ValueIterator();
    }
    Iterator<Map.Entry<K,V>> newEntryIterator()   {
        return new EntryIterator();
    }

三、优势

1、功能方面：

（1）根据key值寻找对应value，应用范围非常广。

2、性能方面

（1）结合数组寻址快速，链表插入删除快速的优点，寻找删除快速。

四、劣势

1、功能

（1）无序

2、性能

3、可靠性

（1）非线程安全的，modCout参数有简单的防并发功能。

五、应用场景和使用注意

（1）大小已知时，HashMap初始化需要传入长度参数，因为不传参数，HashMap的bucket个数不够会做resize比较耗性能。

HashMap深入学习

猜你喜欢