HashMap interview questions + + source code analysis (the TODO)

hashMap our daily collection of the most commonly used type, he inherited AbstractMap class implements the Map <K, V> Cloneable, Serializable Interface

Inheritance diagram below

 

 

public class HashMap<K,V>
    extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable

HashMap hashCode is stored in accordance with the value, in most cases can be positioned directly value, with fast access speed . But is not the order of. Is not thread safe if needed to meet the security thread, you can use Collection of SynchronizedMap or use ConcurrentHashMap.

HashTable: is thread-safe concurrent but not as good as ConcurrentHashMap, and HashMap similar function.

TreeMap: TreeMap implements Sortedmap because the interface, so he stored when ordered, the default is key in ascending order.

HashMap is an array list + + red-black tree (jdk1.8 increase red-black tree portion), when converted into a red-black tree list length is greater than 8.

So what in the end HashMap specific underlying data store is? What advantage is it ?

 static class Node<K,V> implements Map.Entry<K,V> {
        final int hash; //定位索引位置
        final K key;
        V value;
        Node<K,V> next; //指向下一个

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
         //key和key对比 value 和value 对比
if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) return true; } return false; } }

 

This class is an internal HashMap Node achieve a Map.Entry <K, V> interface, the essence is the key-value pairs (mapping), using alternative node in the original entry jdk1.8

 transient Entry[] table;

 

Based on the principle of using a chain address law put <K, V> memory object reaches hashMap using get (key) to obtain from it when we put the first call on key hashCode () to calculate its hash value method to obtain the release a storage position in the list. So get way is to get the position stored by key hash value is calculated, and to determine the value.

How to determine the position of the index hash bucket array?

Check whether additions and deletions to change, to the location of the hash bucket array is very important, hashcode values ​​by key operation is high and modulo operation.

Method a:
 static  Final  int the hash (Object Key) {    // jdk1.8 jdk1.7 & 
     int H;
      // H = key.hashCode () takes a first step hashCode value
      // H ^ (16 >>> H ) for the second step involved in computing the high 
     return (Key == null :) ^ (H >>> 16) 0 (H = key.hashCode (?) ); 
} 
method two: 
static  int indexFor ( int H, int length) {   // source of jdk1.7, jdk1.8 without this method, but the same principle to achieve 
     return H & (. 1-length);   // step modulo arithmetic 
}

 

If key hashCode in the same value, a calculated value of the method is the same, the hash value of the array length modulo operation, so that the distribution of elements is relatively uniform. However, the operation mode of consumption is quite large, in the HashMap are doing this: Call a method to compute the two objects should be stored in a table at which index the array.

This method is very clever, it is the object to be preserved by the bit h & (table.length -1), while the length of the array is always underlying HashMap n-th power of 2, which is optimized in terms of speed HashMap. When the n-th power of 2 always length, h & (length-1) is equivalent to the calculation of modulo length, i.e. length% H,% but higher than & efficient. And optimize the operation of the algorithm is high in jdk1.8 in.

By hashCode () High Low 16-bit 16-bit XOR implemented: (h = k.hashCode ()) ^ (h >>> 16), mainly from the speed, efficiency and quality to consider, in doing so can be array table of length relatively small, but also to ensure taking into account the level of computing Hash Bit are involved in, while there is not much overhead.

Took over to see how the hashmap put key

 

public V put(K key, V value) {
  // 对key的hashCode()做hash
return putVal(hash(key), key, value, false, true); }

 

    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

 

  Final V PutVal ( int the hash, Key K, V value, Boolean onlyIfAbsent,
                    Boolean The evict) { 

  // value of the hash value Key the Node
<K, V> [] Tab; the Node <K, V> P; int n-, I;
    / / 1 If the tab is null or resize extended run length = 0.
IF ((= Table tab) == null || (= n-tab.length) == 0 ) n- = (tab = resize ()) length;.
    / / i is an index, if the index is empty then inserted Node
IF ((P = Tab [I = (n--. 1) & the hash]) == null ) Tab [I] = the newNode (the hash, Key, value, null ); the else {
     the Node
<K, V> E; K K;

    // how hash value for equality, and the key is not empty and equal, then the replacement value
IF (p.hash == hash && ((K = p.key) = || Key = (Key =! null && key.equals (K)))) E = P; the else iF (P the instanceof the TreeNode) determines whether the red-black tree // E = ((the TreeNode <K, V>) P ) .putTreeVal ( the this , Tab, the hash, Key, value); the else {
          // list structure
for ( int BinCount = 0;; ++BinCount) { IF ((E = p.next) == null ) { p.next = the newNode (the hash, Key, value, null ); IF (BinCount> = TREEIFY_THRESHOLD -. 1) // -1 for 1st TREEIFY_THRESHOLD. 8 = Analyzing create red-black tree treeifyBin (Tab, the hash); BREAK ; }
            // if the key is present directly cover value
IF (e.hash the hash == && ((K ! = e.key) == || key (key = null && key.equals (K)))) BREAK; P = E; } } IF (! E = null ) { // existing Key Mapping for V = oldValue e.Value; IF (! OnlyIfAbsent || oldValue == null ) e.Value = value; afterNodeAccess (E); return oldValue; } } ++ ModCount; // increase the fail-fast value
    // value exceeds the threshold on expansion
IF (size ++> threshold) a resize (); afterNodeInsertion (The evict); return null; }

 

Expansion resize ()

Literally, it is to expand the capacity, when kept inside hashmap want to add elements inside the array can not load more content, it will expand the length of the array, so that can hold more data. java array is not automatic expansion. So long needed a new large arrays to replace the original array.

 Final the Node <K, V> [] a resize () { 
        the Node <K, V> [] = oldTab Table; // Node array before
         int OLDCAP = (oldTab == null ) 0? : oldTab.length;
         int oldThr = threshold;
         int newCap, newThr = 0 ;
         IF (OLDCAP> 0 ) {
             IF (OLDCAP> = MAXIMUM_CAPACITY) { 
                threshold = Integer.MAX_VALUE;
                 return oldTab; 
            } 
      // do not exceed the maximum, twice the original expansion of
the else IF ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) { float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab != null) { for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) != null) { oldTab[j] = null; if (e.next == null) newTab[e.hash & (newCap - 1)] = e; else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); else { // preserve order Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null); if (loTail != null) { loTail.next = null; newTab[j] = loHead; } if (hiTail != null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }

 

jdk1.8 extended time to avoid having to recalculate the hash as 1.7 only need to look at new original hash value that is 1 or 0 bit like, is 0, then the index has not changed, then the index is 1 with "original index + oldCap ", can look at the graph 16 is expanded to resize a schematic diagram 32:

 

这个设计确实非常的巧妙,既省去了重新计算hash值的时间,而且同时,由于新增的1bit是0还是1可以认为是随机的,因此resize的过程,均匀的把之前的冲突的节点分散到新的bucket了。这一块就是JDK1.8新增的优化点。有一点注意区别,JDK1.7中rehash的时候,旧链表迁移新链表的时候,如果在新表的数组索引位置相同,则链表元素会倒置,但是从上图可以看出,JDK1.8不会倒置。

 

 

 

如果hashmap 通过链表将产生碰撞的元素组织起来。在jdk1.8中 ,如果在一个位置中,碰撞的元素超过一个限制,默认是8,则使用红黑树来代替链表,来提高速率,而如果两个不同的key计算出的哈希值相同,定位到同一个存储位置,那么我们称之为hash冲突/hash碰撞

如果hash算法计算的哈希值越分散均匀,发生冲突的几率就越小,map懂得存储效率就越会高。当时table[]的大小也会决定发生碰撞的几率。如果table越大,即使算法较差也会冲突很小,反之,table越小,即使算法很好,也会发生很多碰撞。所以需要在空间成本和时间成本做权衡

所以我们需要合适的table大小和好的hash算法。

 transient Node<K,V>[] table; //哈希桶

 

既然说到哈希桶的大小,那么我们就要说一下扩容。从hashmap的构造方法中得知

    int threshold;             //是hashmap的阈值,用来判断hashmap储存的极限容量?? threshold= 容量* 负载因子 当hashmap存储的数量达到了阈值,hashmap就要增加容量
     final float loadFactor;    // 负载因子
     int modCount;  
     int size;

 

 首先,Node[]table的初始长度length(默认值是16),Load factor为负载因子(默认值是0.75) threshold是HashMap所能容纳的最大数据量的Node(键值对)个数。threshold = length * Load factor。也就是说,在数组定义好长度之后,负载因子越大,所能容纳的键值对个数越多。

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // 16
static final float DEFAULT_LOAD_FACTOR = 0.75f;

 

结合公式得知,threshold 就是允许的最大存储值。超过这个数值,就要调用resize()方法扩容, 扩容后容量是之前的两倍。

size 就是hashmap中键值对的数量。而modCount 是几率hashmap内部结构发生变化的次数 ,内部结构发生变化指的是结构发生变化,例如put新键值对,但是某个key对应的value值被覆盖不属于结构变化。

有一个问题就是即使敷在引资和hash算法设计在合理,也免不了出现拉链过长的情况,一旦拉链过长就会严重影响hashmap的性能,于是在1.8中对数据结构做了优化,引入红黑树,当链表长途太长,默认是8,链表就会转为红黑树。利用红黑树快速增删查改的特性提高hashmap的性能。如下图

 

 

 

hash冲突的解决办法:

在hashmap中解决冲突的办法是采用链地址法。(常用的方法有:开放地址法,再哈希法,链地址法,建立公共溢出法   后续讨论 TODO

 

线程不安全

 在多线程使用场景中,应该避免使用hashmap,转向使用线程安全的ConcurrentHashMap

 

public class HashMapInfiniteLoop {  

    private static HashMap<Integer,String> map = new HashMap<Integer,String>(2,0.75f);  
    public static void main(String[] args) {  
        map.put(5, "C");  

        new Thread("Thread1") {  
            public void run() {  
                map.put(7, "B");  
                System.out.println(map);  
            };  
        }.start();  
        new Thread("Thread2") {  
            public void run() {  
                map.put(3, "A);  
                System.out.println(map);  
            };  
        }.start();        
    }  
}

其中,map初始化为一个长度为2的数组,loadFactor=0.75,threshold=2*0.75=1,也就是说当put第二个key的时候,map就需要进行resize。

通过设置断点让线程1和线程2同时debug到transfer方法(3.3小节代码块)的首行。注意此时两个线程已经成功添加数据。放开thread1的断点至transfer方法的“Entry next = e.next;” 这一行;然后放开线程2的的断点,让线程2进行resize。结果如下图。

注意,Thread1的 e 指向了key(3),而next指向了key(7),其在线程二rehash后,指向了线程二重组后的链表。

线程一被调度回来执行,先是执行 newTalbe[i] = e, 然后是e = next,导致了e指向了key(7),而下一次循环的next = e.next导致了next指向了key(3)。

e.next = newTable[i] 导致 key(3).next 指向了 key(7)。注意:此时的key(7).next 已经指向了key(3), 环形链表就这样出现了。

于是,当我们用线程一调用map.get(11)时,悲剧就出现了——Infinite Loop。

JDK1.8与JDK1.7的性能对比

HashMap中,如果key经过hash算法得出的数组索引位置全部不相同,即Hash算法非常好,那样的话,getKey方法的时间复杂度就是O(1),如果Hash算法技术的结果碰撞非常多,假如Hash算极其差,所有的Hash算法结果得出的索引位置一样,那样所有的键值对都集中到一个桶中,或者在一个链表中,或者在一个红黑树中,时间复杂度分别为O(n)和O(lgn)。 鉴于JDK1.8做了多方面的优化,总体性能优于JDK1.7,下面我们从两个方面用例子证明这一点。

Hash较均匀的情况

为了便于测试,我们先写一个类Key,如下:

class Key implements Comparable<Key> {

    private final int value;

    Key(int value) {
        this.value = value;
    }

    @Override
    public int compareTo(Key o) {
        return Integer.compare(this.value, o.value);
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass())
            return false;
        Key key = (Key) o;
        return value == key.value;
    }

    @Override
    public int hashCode() {
        return value;
    }
}

这个类复写了equals方法,并且提供了相当好的hashCode函数,任何一个值的hashCode都不会相同,因为直接使用value当做hashcode。为了避免频繁的GC,我将不变的Key实例缓存了起来,而不是一遍一遍的创建它们。代码如下:

public class Keys {

    public static final int MAX_KEY = 10_000_000;
    private static final Key[] KEYS_CACHE = new Key[MAX_KEY];

    static {
        for (int i = 0; i < MAX_KEY; ++i) {
            KEYS_CACHE[i] = new Key(i);
        }
    }

    public static Key of(int value) {
        return KEYS_CACHE[value];
    }
}

现在开始我们的试验,测试需要做的仅仅是,创建不同size的HashMap(1、10、100、......10000000),屏蔽了扩容的情况,代码如下:

   static void test(int mapSize) {

        HashMap<Key, Integer> map = new HashMap<Key,Integer>(mapSize);
        for (int i = 0; i < mapSize; ++i) {
            map.put(Keys.of(i), i);
        }

        long beginTime = System.nanoTime(); //获取纳秒
        for (int i = 0; i < mapSize; i++) {
            map.get(Keys.of(i));
        }
        long endTime = System.nanoTime();
        System.out.println(endTime - beginTime);
    }

    public static void main(String[] args) {
        for(int i=10;i<= 1000 0000;i*= 10){
            test(i);
        }
    }

在测试中会查找不同的值,然后度量花费的时间,为了计算getKey的平均时间,我们遍历所有的get方法,计算总的时间,除以key的数量,计算一个平均值,主要用来比较,绝对值可能会受很多环境因素的影响。结果如下:

通过观测测试结果可知,JDK1.8的性能要高于JDK1.7 15%以上,在某些size的区域上,甚至高于100%。由于Hash算法较均匀,JDK1.8引入的红黑树效果不明显,下面我们看看Hash不均匀的的情况。

Hash极不均匀的情况

假设我们又一个非常差的Key,它们所有的实例都返回相同的hashCode值。这是使用HashMap最坏的情况。代码修改如下:

class Key implements Comparable<Key> {

    //...

    @Override
    public int hashCode() {
        return 1;
    }
}

仍然执行main方法,得出的结果如下表所示:

从表中结果中可知,随着size的变大,JDK1.7的花费时间是增长的趋势,而JDK1.8是明显的降低趋势,并且呈现对数增长稳定。当一个链表太长的时候,HashMap会动态的将它替换成一个红黑树,这话的话会将时间复杂度从O(n)降为O(logn)。hash算法均匀和不均匀所花费的时间明显也不相同,这两种情况的相对比较,可以说明一个好的hash算法的重要性。

 

 

小结

(1) 扩容是一个特别耗性能的操作,所以当程序员在使用HashMap的时候,估算map的大小,初始化的时候给一个大致的数值,避免map进行频繁的扩容。

(2) 负载因子是可以修改的,也可以大于1,但是建议不要轻易修改,除非情况非常特殊。

(3) HashMap是线程不安全的,不要在并发的环境中同时操作HashMap,建议使用ConcurrentHashMap。

(4) JDK1.8引入红黑树大程度优化了HashMap的性能。

参考

Java 8系列之重新认识HashMap https://zhuanlan.zhihu.com/p/21673805

 

 

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

1.HashMap原理,内部数据结构?

底层使用哈希表(数组加链表)来存储,链表过长会将链表转成红黑树,以实现在O(logn)时间复杂度内查找

2.讲一下HashMap中的put方法过程?

对key求哈希值然后计算下标
如果没有哈希碰撞则直接放入槽中
如果碰撞了以链表的形式链接到后面
如果链表长度超过阈值(默认阈值是8),就把链表转成红黑树
如果节点已存在就替换旧值
如果槽满了(容量*加载因子),就需要resize

3.HashMap中哈希函数是怎么实现的?还有哪些hash实现方式?

高16bit不变,低16bit和高16bit做异或
(n-1)&hash获得下标
还有哪些哈希实现方式?(查资料和博客)

4.HashMap如何解决冲突,讲一下扩容过程。如果一个值在原数组中,扩容后移动到了新数组,位置肯定改变了,如何定位到这个值在新数组中的位置?

将节点加到链表后
容量扩充为原来的两倍,然后对每个节点重新计算哈希值
这个值只可能在两个地方:一种是在原下标位置,另一种是在下标为<原下标+原容量>的位置

5.抛开HashMap,哈希冲突有哪些解决方法?

开放地址法,链地址法

6.针对HashMap中某个Entry链太长,查找的时间复杂度可能达到O(n),如何优化?

将链表转为红黑树,JDK1.8已经实现

 

1.HashMap原理,内部数据结构?

底层使用哈希表(数组加链表)来存储,链表过长会将链表转成红黑树,以实现在O(logn)时间复杂度内查找

2.讲一下HashMap中的put方法过程?

对key求哈希值然后计算下标
如果没有哈希碰撞则直接放入槽中
如果碰撞了以链表的形式链接到后面
如果链表长度超过阈值(默认阈值是8),就把链表转成红黑树
如果节点已存在就替换旧值
如果槽满了(容量*加载因子),就需要resize

3.HashMap中哈希函数是怎么实现的?还有哪些hash实现方式?

高16bit不变,低16bit和高16bit做异或
(n-1)&hash获得下标
还有哪些哈希实现方式?(查资料和博客)

4.HashMap如何解决冲突,讲一下扩容过程。如果一个值在原数组中,扩容后移动到了新数组,位置肯定改变了,如何定位到这个值在新数组中的位置?

将节点加到链表后
容量扩充为原来的两倍,然后对每个节点重新计算哈希值
这个值只可能在两个地方:一种是在原下标位置,另一种是在下标为<原下标+原容量>的位置

5.抛开HashMap,哈希冲突有哪些解决方法?

开放地址法,链地址法

6.针对HashMap中某个Entry链太长,查找的时间复杂度可能达到O(n),如何优化?

将链表转为红黑树,JDK1.8已经实现

Guess you like

Origin www.cnblogs.com/xiaosisong/p/12290251.html