JDK source code learning - HashMap

What is hash?

  
   Hash means "hash", transliterated as "hash", input a data of any length, after the hash operation, output a piece of fixed-length data, as the fingerprint of the input data, the output result is the hash value . Generally speaking, the space of the input data is much larger than the space of the output hash value. Different input data may generate the same hash value, so it is difficult to reversely deduce what the input value is from the hash value. The hash function is essentially a compression algorithm, which compresses messages of different lengths into fixed-length messages.

  
   Hash function has a feature: for the same hash function, if the calculated hash value is different, the input data must be different, but the input data is different, the hash value calculated by the hash function may be the same.

  
   If different data is input, but the same hash value is generated, it is considered that a hash collision has occurred. Conflicts must be resolved. The usual solutions include open address method, chain address method, and re-hash method.

  
   Open address method: When a hash collision occurs, look for and write a hash address in the hash table until an empty hash address is found. As long as the hash table is large enough, an empty hash address can always be found

  
   Chain address method: It uses two data structures, array and linked list. The advantage of the array is that it is very convenient to search, and the corresponding element can be accessed by accessing the data subscript. The disadvantage is that the insertion and deletion of elements is slow. If the insertion is performed in the middle of the array, the memory addresses of many elements need to be moved, and the efficiency is relatively low. The advantage of a linked list is that it is very convenient to insert and delete, and you only need to modify the pointer of the corresponding element. The disadvantage is that it is inconvenient to search. In the chain address method, the advantages of arrays and linked lists are combined, arrays are used for query, and linked lists are used for insertion and deletion.

  
   The elements with the same hash value are connected into a linked list, and the head node of the linked list is stored in the array, so that each hope of the array corresponds to a linked list.

  
   Re-hash method: When the hash address collides, use other hash functions to calculate another hash address until no more hash conflict occurs

HashMap

  
   In the hashmap of the jdk1.7 source code, the chain address method is used to construct an array of Entry type, and store a linked list composed of Entry objects. An Entry object contains a Key-Value pair.

 
  storage element 
 

  
   If you want to add an Entry object to the hashmap, you need to use the put method.

public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
//根据计算出来的hash值，找到数组table对应的下标i，把key-value对放进去
        addEntry(hash, key, value, i);
        return null;
    }

   在put方法中，首先根据Entry对象的key计算它的hash值，由这个hash值确定这个对象在数组中的存储位置（也就是在数组中的下标）。如果当前位置上为空，直接把元素放在这里就可以了。如果当前存储位置上已经有一个元素存在了，说明这两个Entry元素的key计算的hash值相同，所以存储位置才会相同。如果这两个Entry的key通过equals方法比较之后返回true，那么用新加入Entry的value覆盖原来Entry的value，key的值不覆盖。如果这两个Entry的key通过equals方法比较之后范湖false，那么就把Entry元素以链表的形式存放，新加入的Entry元素放在链表的头部，原来的元素放在链表的尾部。
   
 

 
  读取元素 
 

   在HashMap中读取元素需要使用get方法。
   
 

public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }

final Entry<K,V> getEntry(Object key) {
        if (size == 0) {
            return null;
        }

        int hash = (key == null) ? 0 : hash(key);
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }

   首先判断key是否为空，为空就返回空，不为空就进入getEntry方法中，首先计算key的hash值，根据hash值定位到table数组的相应位置，然后在通过比较key在链表中找到需要的元素。
   
 

   HashMap的扩大容量的机制
   
 

   数组的初始容量是1左四位，也就是2^4=16
   
 

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

   当HashMap中的元素越来越多，出现hash冲突的概率越来越高，因为数组的长度是固定的，为了提高查询的效率，需要对数组进行扩大容量。具体什么时候进行扩大容量，要看loadfactor。loadfactor的默认值是0.75.
   
 

/**
 * The load factor used when none specified in constructor.
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

   当数组元素个数超过数组容量乘以loadfactor的时候，就把容量扩大为原来的两倍。
   
 

 
  为什么loadfactor是0.75，而不是其他的的数？ 
 

   这要理解loadfactor装载因子的意义，装载因子衡量的是一个hash表空间的使用程度，他的值越大表示空间利用率越高，他的值越小表明利用率越低。由于hashmap使用的链地址法，查找一个元素的平均时间是常数级别的，装载因子越大，对空间的利用就越充分，但是也会导致查询的效率降低；如果装载因子太小，hash表太稀疏，会造成空间的浪费。因此对时间和空间效率做了一下平衡，把装载因子取值为0.75.
   
 

 
  扩容的时候为什么是2倍，不是1.5或3倍？ 
 

   者主要是出于性能方面的考虑，设计成2的倍数可以通过位运算完成，这样比去乘1.5，乘3操作要快。至于为什么位运算比较快？因为它直接对内存数据进行操作，而不需要转换成十进制在操作，所以效率高。
   
 

JDK source code learning - HashMap

Guess you like