HashMap related issues and analysis

Foreword :
    I watched some HashMap related videos and learning materials, and I have been working on projects. This knowledge point has been on hold for a while. For learning, you have to do it yourself and write by yourself.
    Today, I will recall the things in my mind, and memorize the knowledge points and common questions about HashMap.
    This will not parse the source code. If you like to look at the source code, just wait when you have time and do a pure code interpretation
 
First look at the picture
 
 
Remarks :
  1. Nodes are divided into red or black;
  2. The root node must be black;
  3. The leaf nodes are all black and null;
  4. The two child nodes connecting the red node are all black (the red-black tree will not have adjacent red nodes);
  5. Starting from any node, the path to each leaf node contains the same number of black nodes;
  6. The node newly added to the red-black tree is the red node;
     The red-black tree is a balanced binary tree and needs to have the property of automatically maintaining balance. The above 6 rules are the rules given by the red-black tree to automatically maintain balance.
 
Death n even asked :
 
   Why does the HashMap array have a default length? What is the length? Why is it 16?
 
   Why the writing format is 1<<4 instead of 16 directly?
 
   What is its upper limit? Why is 2 to the power of n and not others?
 
Analysis:
 
In Jdk1.8, when the HashMap constructor is called to define the HashMap, the capacity is set. In Jdk 1.7, you have to wait until the first put operation to perform this operation. By default, when we set the initialization capacity of HashMap, in fact, HashMap will use the first power of 2 greater than the value as initialization Capacity, so HashMap does not necessarily directly use the value we passed in, but after calculation, a new value is obtained. The purpose is to improve the efficiency of hash
If the initial capacity size is not set, as the elements continue to increase, HashMap will expand multiple times, and the expansion mechanism in HashMap determines that the hash table needs to be rebuilt for each expansion, which greatly affects performance. So in order to improve efficiency, it is necessary to set the default length
 
The value of Capacity in HashMap is 16-------writing: 1<<4;
 
The default value is 16, because of the following considerations:
                                                    Reduce hash collision
 
                                                    Improve map query efficiency
 
                                                    Allocation is too small to prevent frequent expansion
 
                                                    Over-allocation is a waste of resources
In the put process, the hashcode() method is called according to the key to calculate the corresponding Hash value, and then the obtained int value is modulo the length of the array . However, in order to consider performance, Java always uses bitwise AND operation to achieve the fetch Modular operation, in order to ensure that each position can be used evenly, the range of index value after taking the modulus must be 0~(2^n)-1, so when calculating the index, you must ensure that the array length is the nth power of 2. And the value of the bit operation with the Hash value is the array length -1, which ensures that it is an odd number, thus ensuring that the range of the index is between 0~(2^n)-1, thus ensuring that each position can be used evenly ; If it is not guaranteed to be an odd number, a conclusion can be drawn through calculation, that is, there is always a number in the range of 0~(2^n)-1 that cannot be obtained by modulo operation, so that the subscript bit of the value in the array is If it is not used, it will increase the probability of hash collisions in other positions, which will complicate the data structure of the index bits of other indexes, and affect the query efficiency of the entire map.
 
In addition, when the length of the array is 2 to the power of n, the probability that the index calculated by different keys is the same is small, then the data is distributed more evenly on the array, which means that the probability of collision is small, relatively, when querying There is no need to traverse the linked list at a certain position, so the query efficiency is higher
 
The Max value of Capacity in HashMap is 1073741824 ------- Writing: 1<<30
 
 
Next is the load factor
 
What is the load factor?
         The load factor is related to the expansion mechanism, which means that if the current container capacity reaches the threshold we set, the expansion operation will start
For example, the current container capacity is 16, the load factor is 0.75, 16*0.75=12, that is to say, when the capacity reaches 12, the expansion operation will be performed
After understanding the meaning of load factor, I think why load factor is needed without explanation
Why is the load factor 0.75 instead of 0.5 or 1?
    analysis:
       HashMap is just a data structure. Since it is the main consideration of the data structure is to save time and space, how to save it?
 
      Load factor 1.0 
       When the load factor is 1.0, it means that expansion will only occur when all 16 positions of the array are filled.
      This brings about a big problem, because Hash conflicts cannot be avoided. When the load factor is 1.0, it means that there will be a lot of Hash conflicts, and the underlying red-black tree becomes extremely complicated. It is extremely detrimental to query efficiency. In this case, time is sacrificed to ensure space utilization
 
      Load factor 0.5
       When the load factor is 0.5, this also means that when the elements in the array reach half, the expansion starts. Since there are fewer elements to be filled, the hash conflict will also be reduced. Then the length of the underlying linked list or the height of the red-black tree Will decrease, and query efficiency will increase. However, the space utilization rate will be greatly reduced at this time, originally storing 10M of data, now requires 20M of space
 
Summary :
 
When the load factor is 0.75, the space utilization rate is relatively high, and quite a lot of Hash conflicts are avoided, making the height of the underlying linked list or red-black tree relatively low, which improves the space efficiency
 
Why is it converted to a red-black tree when the length of the linked list is 8? Why is the linked list changed at 6 o'clock?
 
        When the hashCode discreteness is very good, the probability of using the tree structure is very small, because the data is evenly distributed in each linked list, and the length of a linked list will almost never reach the threshold
        Under the random hashCode, the discreteness may become worse, but the JDK cannot prevent users from implementing this bad hash algorithm, so it may lead to uneven data distribution
        But ideally, the distribution frequency of all the nodes in the linked list in the random hashCode algorithm will follow the Poisson distribution , that is, the probability that the length of the linked list reaches 8 elements is 0.00000006, which is almost impossible.
 
        Therefore, choosing 8 as the threshold for converting a linked list into a red-black tree is definitely not a whim.
 
         Because the operation of the red-black tree involves left-handed, right-handed and other operations, and the singly linked list does not need, so when the node needs to be operated, the cost of the red-black tree is much higher, so in order to reduce the operation cost, when the number of nodes When it is smaller, it is a better choice to convert the red-black tree into a linked list, and then operate
 
HashMap的resize
 
    1. Expansion
 
         When adding elements to the container, the number of elements in the current container will be judged. If it is greater than or equal to the threshold---that is, when the length of the current array is multiplied by the value of the load factor, it will be automatically expanded
        In simple terms:
        When the number of elements in the HashMap exceeds the array size *loadFactor (0.75 by default), the array will be expanded.
        That is, by default, the size of the array is 16, then when the number of elements in the HashMap exceeds 16*0.75=12, the size of the array is expanded to 2*16=32, that is, doubled, and then rehash is used Calculate the position of each element in the array ( after rehash , the position of the array element is either in the original position or moved to the power of 2 in the original position ), and modify the threshold
    
Remarks:
 
      When rehashing in Jdk1.7, when the old linked list is migrated to the new linked list, if the array index position of the new table is the same, the linked list elements will be inverted, and JDK1.8 will not be inverted
 
     Jdk1.8's optimization of expansion:
            Check whether the newly added bit of the original hash value is 1 or 0, whether the 0 index has not changed, and whether the 1 index becomes "original index + oldCap", which saves the time to recalculate the hash value. At the same time, because the newly added 1 bit is 0 or 1 can be considered random, so the resize process evenly distributes the previous conflicting nodes to the new bucket
 
   Two, initialization
        
       HashMap will be initialized according to the initial capacity and load factor. When the initial capacity is less than the maximum number of entries divided by the load factor, a rehash operation will occur.
       The rehash operation is to rebuild the internal data structure. Generally, the length of the array is doubled. During the rehash process, the position of each element in the array is recalculated, which is a very performance-consuming operation. Therefore, during program design, if we have predicted the number of elements in the HashMap, the preset number of elements can effectively improve the performance of the HashMap
 
There are still many things worth studying in HashMap, I won’t repeat them here. If you like, please check the source code and analyze and use it slowly. Come on!
 
 
 
        
    
 
 

Guess you like

Origin blog.csdn.net/weixin_43562937/article/details/106589776