java improve articles ----- HashMap

HashMap is why we use a lot of Collection, which is the hash table implementation of the Map interface based, in the form of key-value. In the HashMap, key-value will always be treated as a whole, the system calculates the storage position to the key-value according to the hash algorithm, we can always be quickly stored by key, taking value. Here we analyzed HashMap access.

First, the definition

      HashMap implements Map interface, inheritance AbstractMap. Map interface defines where the key value is mapped to the rule, rather AbstractMap class provides a skeletal implementation of the Map interface, to minimize the effort required to implement this interface, in fact, have achieved Map AbstractMap class, here marked Map LZ think it should be clearer now!

public class HashMap<K,V>
    extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable

Second, a constructor

      HashMap provides three constructors:

      HashMap (): Constructs a default initial capacity (16) and a default load factor (0.75) having empty HashMap.

      HashMap (int initialCapacity): configured with the specified initial capacity and a default load factor (0.75) is empty HashMap.

      HashMap (int initialCapacity, float loadFactor): a structure with the specified initial capacity and load factor of the air HashMap.

      在这里提到了两个参数:初始容量,加载因子。这两个参数是影响HashMap性能的重要参数,其中容量表示哈希表中桶的数量,初始容量是创建哈希表时的容量,加载因子是哈希表在其容量自动增加之前可以达到多满的一种尺度,它衡量的是一个散列表的空间的使用程度,负载因子越大表示散列表的装填程度越高,反之愈小。对于使用链表法的散列表来说,查找一个元素的平均时间是O(1+a),因此如果负载因子越大,对空间的利用更充分,然而后果是查找效率的降低;如果负载因子太小,那么散列表的数据将过于稀疏,对空间造成严重浪费。系统默认负载因子为0.75,一般情况下我们是无需修改的。

      HashMap是一种支持快速存取的数据结构,要了解它的性能必须要了解它的数据结构。

三、数据结构

      我们知道在Java中最常用的两种结构是数组和模拟指针(引用),几乎所有的数据结构都可以利用这两种来组合实现,HashMap也是如此。实际上HashMap是一个“链表散列”,如下是它数据结构:

HashMap数据结构图_thumb[13]

      从上图我们可以看出HashMap底层实现还是数组,只是数组的每一项都是一条链。其中参数initialCapacity就代表了该数组的长度。下面为HashMap构造函数的源码:

复制代码
the HashMap public (int initialCapacity, a float loadFactor) { 
        // initial capacity can not be <0 
        IF (initialCapacity <0) 
            the throw new new an IllegalArgumentException ( "Illegal Initial Capacity:" 
                    + initialCapacity); 
        // Initial capacity can not> maximum capacity value, HashMap maximum the capacity value of 30 ^ 2 
        IF (initialCapacity> MAXIMUM_CAPACITY) 
            initialCapacity = MAXIMUM_CAPACITY; 
        // load factor can not be <0 
        IF (loadFactor <= 0 || Float.isNaN (loadFactor)) 
            the throw new new an IllegalArgumentException ( "Illegal load factor:" 
                    + loadFactor ); 

        // calculate the minimum value is larger than the n-th power of 2 initialCapacity. 
        Capacity. 1 = int;  
        the while (Capacity <
            Capacity = <<. 1;
        
        = loadFactor this.loadFactor; 
        // set the HashMap capacity limit, when this limit is reached HashMap capacity for expansion will operate 
        threshold = (int) (* Capacity loadFactor); 
        // initialize table array 
        table = new Entry [capacity] ; 
        the init (); 
    }
复制代码

      As can be seen from the source, each time a new the HashMap, will initialize a table array. Entry element array is the node table.

复制代码
static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        final int hash;

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }
        .......
    }
复制代码

      Entry HashMap where internal class, which includes the key key, value value, the next node next, and a hash value, which is very important, because of Entry table items that constitute the array is a linked list.

      The above simple analysis of the data structure of HashMap, HashMap will explore how to achieve fast access.

Fourth, storage implementation: put (key, vlaue)

      First we look at the source code

复制代码
V PUT public (key K, V value) { 
        // when the key is null, the method calls putForNullKey stored null position with a first table, which is allowed to be null because the HashMap 
        IF (key == null) 
            return putForNullKey ( value); 
        // calculate the hash value of the key 
        int hash = hash (key.hashCode ()); ------ (. 1) 
        // key hash value calculated in the position table array 
        int i = indexFor (hash, table.length); ------ (2) 
        // iteration from the start i e, stored position to find the key 
        for (Entry <K, V> e = table [i]; e = null;! e = e.next) { 
            Object K; 
            // determines whether the same hash value (the same as the key piece of the chain) 
            // if the same is present, will overwrite value, return the old value  
            iF (e.hash == hash && (( k = e.key) == key || key.equals (k))) {
                V = oldValue e.Value; // old value = new value 
                e.value = value; 
                e.recordAccess (the this); 
                return oldValue; // return the old value 
            } 
        } 
        // increase the number of modifications. 1 
        ModCount ++; 
        // the key, value added to the position at the i 
        the addEntry (the hash, Key, value, I); 
        return null; 
    }
复制代码

      By source we can clearly see the process HashMap saved data is as follows: first determine whether the key is null, if it is null, then a direct call putForNullKey method. If the blank key to calculate a hash value, and then searches the hash value according to the index table position in the array, the array table if there is an element at this position, by comparing whether there is the same key, if there is to overwrite the original key the value, stored in the element or the head of the chain (the first element on the end of the chain stored). If the table where no element is stored directly. This process may seem relatively simple, but there are deep inside. We are as follows:

      1, look iteration. Iteration reason here is to prevent the presence of the same key value, if found two hash values ​​(key) are the same, HashMap of treatment is to replace the old value with the new value, and does not address key here, which explains in no HashMap two of the same key.

      2, looking at (1), (2) at. Here is the essence of the HashMap. The first is the hash method, which is a purely mathematical calculation, is to calculate the hash value h.

static int hash(int h) {
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

      We know that the HashMap table, the data needs to be distributed evenly (preferably each have only one element, so that you can find direct), not too tight nor too loose, too tight will lead to slow queries, too loose is a waste space. After calculating the hash value, how to ensure the table elements are distributed with it? We think modulo, but due to the modulo consume more, HashMap is handled this way: Call indexFor method.

static int indexFor(int h, int length) {
        return h & (length-1);
    }

      HashMap underlying array of length n is always a power of two, is present in the constructor: capacity << = 1; do it always ensures HashMap length of the underlying array of 2 ^ n. When the length of the n-th power of 2, h & (length - 1) modulo length is equivalent to, but much faster than directly modulus, which is optimized in a HashMap speed. As for why is explained below 2 ^ n.

      We return to indexFor, the method is only one statement: h & (length - 1), except the above sentence modulo operation there is a very important responsibility: uniform distribution table data and full use of space.

      Here we assume that length of 16 (2 ^ n) and 15, h is 5,6,7.

table1_thumb[3]

      当n=15时,6和7的结果一样,这样表示他们在table存储的位置是相同的,也就是产生了碰撞,6、7就会在一个位置形成链表,这样就会导致查询速度降低。诚然这里只分析三个数字不是很多,那么我们就看0-15。

table2_thumb[16]

      从上面的图表中我们看到总共发生了8此碰撞,同时发现浪费的空间非常大,有1、3、5、7、9、11、13、15处没有记录,也就是没有存放数据。这是因为他们在与14进行&运算时,得到的结果最后一位永远都是0,即0001、0011、0101、0111、1001、1011、1101、1111位置处是不可能存储数据的,空间减少,进一步增加碰撞几率,这样就会导致查询速度慢。而当length = 16时,length – 1 = 15 即1111,那么进行低位&运算时,值总是与原来hash值相同,而进行高位运算时,其值等于其低位值。所以说当length = 2^n时,不同的hash值发生碰撞的概率比较小,这样就会使得数据在table数组中分布较均匀,查询速度也较快。

      这里我们再来复习put的流程:当我们想一个HashMap中添加一对key-value时,系统首先会计算key的hash值,然后根据hash值确认在table中存储的位置。若该位置没有元素,则直接插入。否则迭代该处元素链表并依此比较其key的hash值。如果两个hash值相等且key值相等(e.hash == hash && ((k = e.key) == key || key.equals(k))),则用新的Entry的value覆盖原来节点的value。如果两个hash值相等但key值不等 ,则将该节点插入该链表的链头。具体的实现过程见addEntry方法,如下:

复制代码
void addEntry(int hash, K key, V value, int bucketIndex) {
        //获取bucketIndex处的Entry
        Entry<K, V> e = table[bucketIndex];
        //将新创建的 Entry 放入 bucketIndex 索引处,并让新的 Entry 指向原来的 Entry 
        table[bucketIndex] = new Entry<K, V>(hash, key, value, e);
        //若HashMap中元素的个数超过极限了,则容量扩大两倍
        if (size++ >= threshold)
            resize(2 * table.length);
    }
复制代码

      这个方法中有两点需要注意:

      一是链的产生。这是一个非常优雅的设计。系统总是将新的Entry对象添加到bucketIndex处。如果bucketIndex处已经有了对象,那么新添加的Entry对象将指向原有的Entry对象,形成一条Entry链,但是若bucketIndex处没有Entry对象,也就是e==null,那么新添加的Entry对象指向null,也就不会产生Entry链了。

      二、扩容问题。

      随着HashMap中元素的数量越来越多,发生碰撞的概率就越来越大,所产生的链表长度就会越来越长,这样势必会影响HashMap的速度,为了保证HashMap的效率,系统必须要在某个临界点进行扩容处理。该临界点在当HashMap中元素的数量等于table数组长度*加载因子。但是扩容是一个非常耗时的过程,因为它需要重新计算这些数据在新table数组中的位置并进行复制处理。所以如果我们已经预知HashMap中元素的个数,那么预设元素的个数能够有效的提高HashMap的性能。

五、读取实现:get(key)

      相对于HashMap的存而言,取就显得比较简单了。通过key的hash值找到在table数组中的索引处的Entry,然后返回该key对应的value即可。

复制代码
public V get(Object key) {
        // 若为null,调用getForNullKey方法返回相对应的value
        if (key == null)
            return getForNullKey();
        // 根据该 key 的 hashCode 值计算它的 hash 码  
        int hash = hash(key.hashCode());
        // 取出 table 数组中指定索引处的值
        for (Entry<K, V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) {
            Object k;
            //若搜索的key与查找的key相同,则返回相对应的value
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }
复制代码

      在这里能够根据key快速的取到value除了和HashMap的数据结构密不可分外,还和Entry有莫大的关系,在前面就提到过,HashMap在存储过程中并没有将key,value分开来存储,而是当做一个整体key-value来处理的,这个整体就是Entry对象。同时value也只相当于key的附属而已。在存储的过程中,系统根据key的hashcode来决定Entry在table数组中的存储位置,在取的过程中同样根据key的hashcode取出相对应的Entry对象。

Guess you like

Origin www.cnblogs.com/diandianquanquan/p/11422641.html
Recommended