Some thoughts on HashMap

1. The role of HashMap's load factor

When the number of elements in the HashMap (including the elements on the linked list and the red-black tree) reaches 0.75 times the length of the array, the expansion begins.

Second, why the load factor of HashMap is 0.75

The main purpose is to improve space utilization and reduce query costs (it can also be said to reduce hash conflicts as much as possible).

Third, why the number of slots must use 2^n

If you want to make the distribution of Hash results more even, the first thing that comes to mind is to use the remainder (%) operation. Here comes the point: "In the remainder (%) operation, if the divisor is a power of 2, it is equivalent to the AND (&) operation (that is, hash% length == hash & (length-1)) The premise is that length is 2 to the power of n).” And the use of binary bit operation &, relative to% can improve the efficiency of calculation, which explains why the length of HashMap is a power of 2.

Fourth, the method to solve the Hash conflict

1. Open address law

: : Fi (key) = (f (key) + di) MOD m (di = 0,1,2,3, ......, m-1)

key: the element to be put into the array (hash table); m: the length of the array

When a conflict occurs, a detection sequence is formed in the hash table using a certain detection technique. Search unit by unit along this sequence until a given keyword is found, or an open address (that is, the address unit is empty) is encountered (if you want to insert, when the open address is detected, you can insert the The new node is stored in the address unit). An open address detected during search indicates that there is no keyword to be searched in the table, that is, the search fails.

(1) Linear detection method

The idea is: Calculate the subscript of the element in the array through the formula, if there is no element in the subscript, put it directly; if there is an element in the subscript, the di in the formula is recalculated by +1 in turn, until no element is found Subscript. Otherwise, the array is full and needs to be expanded.

(2) Second detection method

The idea is: to query the subscripts without elements by changing the calculation method of di, the specific calculation method is di=-12,12,-22,22,...,-(q * 10 + 2), (q * 10 + 2 ), q <=m / 2. As for the value of di, I haven't studied it. I copied it, but the idea of ​​this detection method must be known.

The situation considered is that if all the subscripts after the subscript are calculated by the formula are occupied by elements, and there are free ones in front of this subscript, it can be calculated by the first method, but the number of calculations is relatively large. This method can reduce the number of calculations.

(3) Pseudo-random number detection and hashing

The idea is: the value of di is obtained through a random function. If the seed of the random function is the same, then the obtained di is also the same, and the query is ok.

In short, the open addressing method can always find an address that does not conflict as long as the hash table is not filled, which is our common way to resolve conflicts.

2. Zipper method

That is, when a Hash conflict occurs, a linked list is formed on the conflicting node. HashMap is the Hash conflict resolved by the zipper method.

5. Why does the linked list turn into a red-black tree when the length of the linked list reaches 8?

When using 0.75 as the load factor, it is almost impossible for the length of the linked list to reach 8. Balance the strategy.

Quoting the comments in the HashMap source code:

* 0:    0.60653066* 1:    0.30326533* 2:    0.07581633* 3:    0.01263606* 4:    0.00157952* 5:    0.00015795* 6:    0.00001316* 7:    0.00000094* 8:    0.00000006* more: less than 1 in ten million

6. What happened to the position of the elements during HashMap expansion?

Divided into three situations:

  • For the elements on the array: directly use the calculated hash value to recalculate the new subscript and put it into the new array.
  • For linked lists: split a linked list into two, the new linked list with a hash value greater than the length of the array is placed in the new array, and the one with a hash value less than the length of the array is placed in the original array.
  • For the red-black tree: split the number into two linked lists, put the new linked list with a hash value greater than the length of the array in the new array, and put the new linked list with a hash value greater than the length of the array in the original array. Finally, re-judge whether the two linked lists need to be converted to a red-black tree.

Key code:

do {
    next = e.next;    if ((e.hash & oldCap) == 0) {        if (loTail == null)
            loHead = e;        else
            loTail.next = e;
        loTail = e;
    }    else {        if (hiTail == null)
            hiHead = e;        else
            hiTail.next = e;
        hiTail = e;
    }
} while ((e = next) != null);

For example: oldCap is 16, then the length of the new array after expansion is 32, and the elements on the linked list are 7, 23, 39 respectively. (The hash value of an integer is itself)

  7 :0000 0111
& 16:0001 0000
---------------
 = :0000 0000 # 0, still in the original position 
  17:0001 0001
& 16:0001 0000
---------------
 = :0001 0000 # Not 0, need to be placed between [17, 32) 
  23:0001 0111
& 16:0001 0000
---------------
 = :0001 0000 # Not 0, need to be placed between [17, 32) 
  39:0010 0111
& 16:0001 0000
---------------
 = :0000 0000 # 0, still in place, because its value is greater than the length of the array



Guess you like

Origin blog.51cto.com/15138908/2676975