Java collections: HashMap and principles underlying implementation (source code parsing)

Note: the contents of the article based on JDK1.7 analysis. Article 1.8 at the end to explain the changes made.

First, first to familiarize yourself with our common HashMap:
1, Overview
HashMap Map-based interface, the storage elements are key to the way and allows the use of null and construction of a null value, because the key must be unique, so only one key is null, additional elements into HashMap not guarantee the order, which is disorderly, and not sequentially into the same. HashMap is thread safe.

2, the inheritance
public class the HashMap <K, V> the extends AbstractMap <K, V>
the implements the Map <K, V>, the Cloneable, the Serializable
. 3, the basic properties of
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4 ; // default initialization size 16
static final float DEFAULT_LOAD_FACTOR = 0.75f; // load factor 0.75
static Final the Entry [] = {} EMPTY_TABLE <,??>; // default initialization array
transient int size; // number of elements in the HashMap
int threshold; // determining whether to adjust the capacity of HashMap

Note: HashMap expansion operation is a very time consuming task, so if we can estimate Map of capacity, it is best to give it a default initial value, to avoid multiple expansion. HashMap thread is unsafe, multi-threaded environment is recommended ConcurrentHashMap.

**

Second, the difference between frequently asked HashMap and Hashtable of

**
1, thread safety
of both the main difference is that Hashtable is thread safe, while HashMap is not thread safe.

Hashtable implementation methods which are added to the synchronized keyword to ensure thread synchronization, and therefore relatively HashMap performance will be higher, we usually recommend the use of HashMap Unless otherwise required when used in a multithreaded environment if used HashMap need to use Collections. synchronizedMap () method to obtain a thread-safe collection.

Note: Collections.synchronizedMap () implementation principle is defined inside the class Collections SynchronizedMap. This class implements the Map interface, using synchronized method call to ensure thread synchronization, or we pass the course actually HashMap instance operation , simply means that Collections.synchronizedMap () method helped us to automatically add the synchronized during operation HashMap to implement thread synchronization, similar to other Collections.synchronizedXX method is similar principle.

2, for different null of
HashMap null can be used as a key, while Hashtable does not allow null as a key
though HashMap support null value as a key, but the proposal is still to avoid such use, because if not used with care, and if therefore cause problems, investigation and it was very cumbersome.
Note: When HashMap null as key, is always stored in a node table on the first array.

3, inheritance structure
HashMap is an implementation of the Map interface, HashTable implements Map interface and the abstract class Dictionary.

4, the initial capacity and expansion of
the initial capacity of the HashMap 16, Hashtable initial capacity of 11, both the fill factor of 0.75 is the default.

When the current expansion HashMap i.e. double the capacity: Capacity 2, when the expansion is Hashtable i.e. +1 double capacity: Capacity 2 +. 1.

5, two different methods of calculating hash
Hashtable hash calculated hashcode key is used as the length of the array table directly modulo

int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % tab.length;
  • 1
  • 2

HashMap hash calculated on the key hashcode conducted a second hash, the hash value to obtain a better, then the array lengths are touch table.

int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);

static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }
 
 static int indexFor(int h, int length) {
        return h & (length-1);
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

Three, HashMap data storage structure

. 1, a HashMap and the chain arrays to achieve the stored data
HashMap employed Entry array to store the key-value pairs, each pair consisting of a key Entry Entity, Entry class is actually a one-way linked list structure having Next pointer, you can connect the next Entry entity, in order to solve the problem Hash conflict.

Array storage interval is continuous, severe memory footprint, it is a great space complex. However, the array binary search time complexity is small, is O (1); Array features are: easy addressing, inserting and deleting difficulties;

Discrete list storage section, take up more relaxed memory, so the space complexity is very small, but a large time complexity of O (N). Features list are: addressing difficult, insertions and deletions easy.
Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description

从上图我们可以发现数据结构由数组+链表组成,一个长度为16的数组中,每个元素存储的是一个链表的头结点。那么这些元素是按照什么样的规则存储到数组中呢。一般情况是通过hash(key.hashCode())%len获得,也就是元素的key的哈希值对数组长度取模得到。比如上述哈希表中,12%16=12,28%16=12,108%16=12,140%16=12。所以12、28、108以及140都存储在数组下标为12的位置。

HashMap里面实现一个静态内部类Entry,其重要的属性有 hash,key,value,next。

HashMap里面用到链式数据结构的一个概念。上面我们提到过Entry类里面有一个next属性,作用是指向下一个Entry。打个比方, 第一个键值对A进来,通过计算其key的hash得到的index=0,记做:Entry[0] = A。一会后又进来一个键值对B,通过计算其index也等于0,现在怎么办?HashMap会这样做:B.next = A,Entry[0] = B,如果又进来C,index也等于0,那么C.next = B,Entry[0] = C;这样我们发现index=0的地方其实存取了A,B,C三个键值对,他们通过next这个属性链接在一起。所以疑问不用担心。**也就是说数组中存储的是最后插入的元素。**到这里为止,HashMap的大致实现,我们应该已经清楚了。

 public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value); //null总是放在数组的第一个链表中
        int hash = hash(key.hashCode());
        int i = indexFor(hash, table.length);
        //遍历链表
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            //如果key在链表中已存在,则替换为新value
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
 
        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

 

void addEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<K,V>(hash, key, value, e); //参数e, 是Entry.next
    //如果size超过threshold,则扩充table大小。再散列
    if (size++ >= threshold)
            resize(2 * table.length);
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31

四、重要方法深度解析

1、构造方法

HashMap()    //无参构造方法
HashMap(int initialCapacity)  //指定初始容量的构造方法 
HashMap(int initialCapacity, float loadFactor) //指定初始容量和负载因子
HashMap(Map<? extends K,? extends V> m)  //指定集合,转化为HashMap
  • 1
  • 2
  • 3
  • 4

HashMap提供了四个构造方法,构造方法中 ,依靠第三个方法来执行的,但是前三个方法都没有进行数组的初始化操作,即使调用了构造方法此时存放HaspMap中数组元素的table表长度依旧为0 。在第四个构造方法中调用了inflateTable()方法完成了table的初始化操作,并将m中的元素添加到HashMap中。

2、添加方法

public V put(K key, V value) {
        if (table == EMPTY_TABLE) { //是否初始化
            inflateTable(threshold);
        }
        if (key == null) //放置在0号位置
            return putForNullKey(value);
        int hash = hash(key); //计算hash值
        int i = indexFor(hash, table.length);  //计算在Entry[]中的存储位置
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i); //添加到Map中
        return null;
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

在该方法中,添加键值对时,首先进行table是否初始化的判断,如果没有进行初始化(分配空间,Entry[]数组的长度)。然后进行key是否为null的判断,如果key==null ,放置在Entry[]的0号位置。计算在Entry[]数组的存储位置,判断该位置上是否已有元素,如果已经有元素存在,则遍历该Entry[]数组位置上的单链表。判断key是否存在,如果key已经存在,则用新的value值,替换点旧的value值,并将旧的value值返回。如果key不存在于HashMap中,程序继续向下执行。将key-vlaue, 生成Entry实体,添加到HashMap中的Entry[]数组中。
3、addEntry()

/*
 * hash hash值
 * key 键值
 * value value值
 * bucketIndex Entry[]数组中的存储索引
 * / 
void addEntry(int hash, K key, V value, int bucketIndex) {
     if ((size >= threshold) && (null != table[bucketIndex])) {
         resize(2 * table.length); //扩容操作,将数据元素重新计算位置后放入newTable中,链表的顺序与之前的顺序相反
         hash = (null != key) ? hash(key) : 0;
         bucketIndex = indexFor(hash, table.length);
     }

    createEntry(hash, key, value, bucketIndex);
}
void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

添加到方法的具体操作,在添加之前先进行容量的判断,如果当前容量达到了阈值,并且需要存储到Entry[]数组中,先进性扩容操作,空充的容量为table长度的2倍。重新计算hash值,和数组存储的位置,扩容后的链表顺序与扩容前的链表顺序相反。然后将新添加的Entry实体存放到当前Entry[]位置链表的头部。在1.8之前,新插入的元素都是放在了链表的头部位置,但是这种操作在高并发的环境下容易导致死锁,所以1.8之后,新插入的元素都放在了链表的尾部。
4、获取方法:get

public V get(Object key) {
     if (key == null)
         //返回table[0] 的value值
         return getForNullKey();
     Entry<K,V> entry = getEntry(key);

     return null == entry ? null : entry.getValue();
}
final Entry<K,V> getEntry(Object key) {
     if (size == 0) {
         return null;
     }

     int hash = (key == null) ? 0 : hash(key);
     for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
         Object k;
         if (e.hash == hash &&
             ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
      }
     return null;
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24

在get方法中,首先计算hash值,然后调用indexFor()方法得到该key在table中的存储位置,得到该位置的单链表,遍历列表找到key和指定key内容相等的Entry,返回entry.value值。

5、删除方法

public V remove(Object key) {
     Entry<K,V> e = removeEntryForKey(key);
     return (e == null ? null : e.value);
}
final Entry<K,V> removeEntryForKey(Object key) {
     if (size == 0) {
         return null;
     }
     int hash = (key == null) ? 0 : hash(key);
     int i = indexFor(hash, table.length);
     Entry<K,V> prev = table[i];
     Entry<K,V> e = prev;

     while (e != null) {
         Entry<K,V> next = e.next;
         Object k;
         if (e.hash == hash &&
             ((k = e.key) == key || (key != null && key.equals(k)))) {
             modCount++;
             size--;
             if (prev == e)
                 table[i] = next;
             else
                 prev.next = next;
             e.recordRemoval(this);
             return e;
         }
         prev = e;
         e = next;
    }

    return e;
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33

删除操作,先计算指定key的hash值,然后计算出table中的存储位置,判断当前位置是否Entry实体存在,如果没有直接返回,若当前位置有Entry实体存在,则开始遍历列表。定义了三个Entry引用,分别为pre, e ,next。 在循环遍历的过程中,首先判断pre 和 e 是否相等,若相等表明,table的当前位置只有一个元素,直接将table[i] = next = null 。若形成了pre -> e -> next 的连接关系,判断e的key是否和指定的key 相等,若相等则让pre -> next ,e 失去引用。

6、containsKey

public boolean containsKey(Object key) {
        return getEntry(key) != null;
    }
final Entry<K,V> getEntry(Object key) {
        int hash = (key == null) ? 0 : hash(key.hashCode());
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

containsKey方法是先计算hash然后使用hash和table.length取摸得到index值,遍历table[index]元素查找是否包含key相同的值。

7、containsValue

public boolean containsValue(Object value) {
    if (value == null)
            return containsNullValue();

    Entry[] tab = table;
        for (int i = 0; i < tab.length ; i++)
            for (Entry e = tab[i] ; e != null ; e = e.next)
                if (value.equals(e.value))
                    return true;
    return false;
    }
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

containsValue方法就比较粗暴了,就是直接遍历所有元素直到找到value,由此可见HashMap的containsValue方法本质上和普通数组和list的contains方法没什么区别,你别指望它会像containsKey那么高效。

五、JDK 1.8的 改变

1, HashMap using a linked list arrays + + red-black tree implementation.
In Jdk1.8 in HashMap implementation made some changes, but the basic idea did not become, but in some places is optimized, the following look at these changes in place, data structures by an array + linked list, change list storage array + + red-black tree, when the chain length exceeds a threshold value (8), the list is converted to red-black tree. Further increase in performance.
Here Insert Picture Description

2, put simple analytical method:

public V put(K key, V value) {
    //调用putVal()方法完成
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    //判断table是否初始化,否则初始化操作
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    //计算存储的索引位置,如果没有元素,直接赋值
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        //节点若已经存在,执行赋值操作
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        //判断链表是否是红黑树
        else if (p instanceof TreeNode)
            //红黑树对象操作
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            //为链表,
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    //链表长度8,将链表转化为红黑树存储
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                //key存在,直接覆盖
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    //记录修改次数
    ++modCount;
    //判断是否需要扩容
    if (++size > threshold)
        resize();
    //空操作
    afterNodeInsertion(evict);
    return null;
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58

If there are key nodes, return the old value, Null if there is no return.

Original Address: https: //www.cnblogs.com/java-jun-world2099/p/9258605.html

Guess you like

Origin www.cnblogs.com/scorates/p/11595930.html