Hard Interviewer Series: Knowing these five aspects is enough about HashMap 1

One, the data structure of HashMap

HashMap<String,String> map=new HashMap();
map.put("1","Kobe");

These two lines of code indicate that the data has been stored in the HashMap. This also raises a question, how can data be efficiently stored in HashMap?

Starting from this question, we should first understand the underlying data structure of HashMap.

HashMap: array + linked list [singly linked list] + red-black tree JDK1.8

 

What we all know is that HashMap is a container for storing key-value pairs (key, value). From the above figure, should key, value, or both be placed in each small grid?

If you have read the source code, you will know that an object-oriented idea is adopted here to encapsulate [key, value]

class Node{
    private String key;
    private String value;
}

It can be seen that each small grid is a new Node, and if you want to implement them in detail, you only need to make slight changes on the basis of Node.

Node[] table=new Node[24];  //表示数组

class Node{
    private String key;     //表示单项链表
    private String value;
    Node next;
}

class TreeNode entends Node{     //红黑树的伪码表示
    parent;
    left;
    right;
}

Two, hash function and collision

When we get the data, to store it in the HashMap, we need to determine the position of the Node object composed of key and value in the array index subscript.

If you want to get the location, you need:

  • Array length length
  • Get an integer number [0 ---- length-1]

(1) We may first think of using

Random.nextInt(length);

But this will cause two problems:

  1. Too likely to repeat randomly
  2. No basis when searching

(2) In view of this situation, hashCode comes on stage:

  1. Get integer
int hash = key.hashCode()  ——> 32位的0和1组成的整数
如果我们用一个例子来表示:“1”.hashCode 有可能会超过存储范围

  1. Control the range of this integer
这时就需要控制整形的hash值的范围:hash%length = 需要的范围

But even this will cause certain problems. In hash = key.hashCode();, if the value of key is 31, 47 and the like, the result obtained after modulo 16 (hash%16) is all 1, so the Node object may go to the same position The sex will be greater, and the storage resources will be greatly wasted.

To make the result of index not repeat as much as possible, you need to change a calculation form: hash & length-1

 

The result obtained is also between 0 and 15, which is the same as the result obtained by the modulo operation.

But even so, different hash values ​​may produce the same index:

 

At this time, it is necessary to perform XOR operation on the low 16 bits and high 16 bits of the original hash value:

 

Hash function: Perform an exclusive OR operation on the high 16 bits and low 16 bits of key.hashCode(), so that the possibility of the last few bits of the final hash value being repeated is much lower than the original.

hash collision

In hash&(n-1), if the index result is repeated, it means a collision.

 

Insert the picture description here op2: length-1 ——> 01111 this form, if it is not this form, no matter whether the end value of op1 is 1 or 0, the final calculation result will be 0, which increases the probability of repetition .

The index actually depends on op1, because except for the first digit of op2, the other digits are all 1, which also means that the size of the array must be 10000-1=0111 (the power of 2)

Three, the process of put

When we create a new HashMap, we need to put, that is, store data into it. When putting, you need to request the hash of the key value (hash(key) ). The hash here is for later use when determining the position. After completing the hash function and maintaining a few variables, you can start the specific put process.

  1. Check whether the Node array is initialized, if it is not initialized, then you need to initialize it.
if ((tab = table) == null || (n = tab.length) == 0)
    n = (tab = resize()).length;

resize() method initialization

newCap = DEFAULT_INITIAL_CAPACITY;        //默认数组大小16
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);        //16*0.75=12  扩容标准

Node<K,V> newTab = (Node<K,V>[])new Node[newCap];        //数组初始化

2. According to the hash result obtained by the hash function, calculate the position of the Node node's subscript, and start storing data.

If the calculated subscript position of the Node node is 1, judge whether there is a Node node at the position 1 originally. If not, then create a Node object directly and place it in the array. If there is an element at position 1, it is divided into three cases: (1) The key value is the same, and the value value is directly replaced (2) The key value is not the same, and the storage is carried out in a linked list (3) The key value is not the same, press red and black Tree storage

else {
    Node<K,V> e; K k;
    //key值相同,直接替换value值
    if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
    e=p;
    //key值不相同,按链表的方式进行存储
    else if (p instanceof TreeNode)
        e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
    else {
        //key值不相同,按链表的方式进行存储 ————> 循环遍历当前链表,直到找到当前链表的最后一个节点,next==null,将new出来的Node放到最后节点的后面
        for (int binCount = 0; ; ++binCount){
            if ((e = p.next) == null){
                p.next = newNode(hash, key, value, null);
                //但凡新增加一个节点,就检查长度有没有超过8
                if (binCount >= TREEIFY_THRESHOLD - 1)
                    //链表转红黑树
                    treeifyBin(tab, hash);
                break;
            }
            if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
            break;
        p = e;
        }
    }
    if (e != null) {
        V oldValue = e.value;
        if (!onlyIfAbsent || oldValue null)
            e.value = value;
        afterNodeAccess(e);
        return oldValue;
    }
}

Note: If the length of the linked list exceeds 8, it is converted to a red-black tree; if the node in the red-black tree is less than 6, it is converted to a linked list.

Fourth, the expansion of HashMap

When the size of the array cannot meet the storage requirements, the HashMap needs to be expanded.

The method of expansion is to create a new array and migrate the [linked list, red-black tree] in the old array to the new array.

Note: When expanding capacity, ensure that the expansion is a multiple of 2, such as 16 —> 32, which conforms to the power of 2 law.

And under what circumstances will expansion occur?

If the size of the array is 16, when the number of nodes in the entire data structure exceeds 16*0.75=12, expansion will occur.

//源码中的0.75就是负载因子
static final float DEFAULT_LOAD_FACTOR = 0.75f;

if (++size > threshold)        //扩容标准,这里的threshold就是16*0.75
    resize();               //功能:初始化/扩容

//这里的MAXIMUM_CAPACITY是2^30,如果老数组大于这个数,就不需要扩容
if (oldCap >= MAXIMUM_CAPACITY) {
    threshold = Integer.MAX_VALUE;
    return oldTab;
}

//如果没有超过,就将老数组大小向右位移一位
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY)
    //而这时使用的是新的数组,所以扩容标准也增加一倍,为24
    newThr = oldThr << 1;

When these two parameters are obtained, a new array can be created:

Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];

Then you need to migrate the nodes of the old array to the new array:

  1. Loop through the index of the old array
  2. Determine whether there is an element in the current subscript position, and only if there is an element is it worth migrating
  3. If there is an element at the subscript position, and there is no element below
  4. If there are elements below and are in the form of a red-black tree
  5. If there are elements below and they are in linked list form
if (oldTab != null) {
    //循环遍历老的数组的下标
    for(int j = 0;j < oldCap; ++j) {
        Node<K,V> e;
        //判断当前下标位置有没有元素,有元素才值得迁移
        if ((e = oldTab[j]) != null) {
            oldTab[j] = null;
            //如果下标位置有元素,并且下面没有元素
            if (e.next == null)
                //得到Node节点再新数组下标的位置
                newTab[e.hash & (newCap - 1)] = e;
            else if (e instanceof TreeNode)
                //如果下面有元素,并且是红黑树形式
                ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else {
                    Node<K,V> loHead = null, lotail = null;
                    Node<K,V> hiload = null,hiTail = null;
                    Node<K,V> next;
                    //如果下面有元素,并且是链表形式
                    do {
                        next = e.next;
                        //老数组链表中i位置的Node节点,会保存到新数组中对应的i位置
                        if ((e.hash &oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                                loTail = e;
                        }
                        //老数组链表中i位置的Node节点,会保存到新数组中对应的i+oldCap位置        1+16=17
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j+oldCap] = hiHead;
                    }
                }
        }
    }
}

Five, thread safety

Multi-threaded execution operations and single-threaded execution operations, the final data is inconsistent, this is thread insecurity. If you want to ensure thread safety, this thread needs three major properties: atomicity, visibility, and order.

Method: Only when the operation of this thread is completed or exits abnormally, other threads can come in and operate. You can add the synchronized keyword in the put process. But this will cause each thread to have a lock, which greatly reduces efficiency. At this time, you can use hashtable or ConcurrentHashMap, and do not extend too much here.

Original link: http://m6z.cn/6s8bYq

If you think this article is helpful to you, you can forward it and follow it for support

Guess you like

Origin blog.csdn.net/weixin_48182198/article/details/109334942