Original | HashMap understand if these two points, the interview is no problem

HashMap is a frequent visitor to the rear end of the interview, such as the default initial capacity is how much? The load factor is how much? Is a non-thread-safe? Operation repeat the process put under? get repeat operation next? What is the difference in jdk 1.7 and 1.8 to achieve? So a series of questions, these questions you may be able fluent description of the HashMap is quite understandable, but recently we had a team of students to share technology, which has quite a few I harvest, I have to share at

Every Friday we will be sharing technology, we take turns sharing, in fact, this mechanism is fine, we sit together in-depth discussion of a knowledge point, the collision of thinking, win

Throw two questions, see if you can answer them?

1. How to find greater than the initial capacity of the smallest integer value of a power of 2?

2. HashMap in time to make hash key processing done anything special operations? Why do this?

At first to think for themselves, and then read on the better Oh!

The following analysis is for jdk 1.8

analysis

Question 1: How to find the initial capacity is set larger than the smallest integer value of a power of 2?

When we use the HashMap, if the default constructor, it will build an initial capacity of 16, the load factor for the HashMap 0.75. This has the disadvantage that a large amount of data in time, will conduct frequent operations expansion, expansion of shifting data will occur, in order to avoid expansion, improve performance, we used estimates under capacity, then by construction with a capacity of create, look at the source code

~~~ javapublic HashMap (int initialCapacity, float loadFactor) {... // if the initial capacity greater than the maximum capacity is set by default to the maximum capacity of 30 ^ 2
IF (initialCapacity> MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; ... this.loadFactor = loadFactor; // tableSizeFor calculation method is mainly given initial capacity larger than the minimum value of a power of the integer this.threshold 2 = tableSizeFor (initialCapacity);} ~~~

We found that by the source, the maximum capacity of 2 ^ 30, i.e. the length of the range of the array portion HashMap [0,2 ^ 30], and then calculates the initial capacity is larger than the smallest integer of a power of 2, wherein ** tableSizeFor ** method is the focus, we look at the source code

~~~ java// Returns a power of two size for the given target capacitystatic final int tableSizeFor(int cap) {int n = cap - 1;n |= n >>> 1;n |= n >>> 2;n |= n >>> 4;n |= n >>> 8;n |= n >>> 16;return (n < 0) ? 1 : (n >= MAXIMUMCAPACITY) ? MAXIMUMCAPACITY : n + 1;}~~~

This method of design is very clever, because HashMap to ensure that capacity is an integer power of 2, the effect of the method to realize is that if you enter the cap itself is a power of 2, then return cap itself, cap if the input is not 2 the integer power, the return is greater than the cap smallest integer power of 2

Why capacity if the integer power of 2?

Since access key corresponding to the array index is the key operation by the array length hash value is -1, such as: tab [i = (n - 1) & hash]

1. n is an integer power of 2, so that n-1 prior to the later bits 1 are all 1, so that we can ensure that (n-1) & bits after respective hash can be both 0 and 1 may be, depending on the value of the hash, the hash of this will ensure uniformity, the operational efficiency while high

2. If n is not an integer power of 2, it will cause more conflict hash

The method first performs the operation cap -1, benefits of doing so is to avoid input cap is an integer power of 2, the number of the last calculation is twice the case cap, the cap has been provided as to meet the requirements of the HashMap is not necessary initializing a two-fold volume of HashMap, no hurry do not understand behind exemplary analysis

We have already described the HashMap maximum capacity of 2 ^ 30, the maximum capacity is 30 'bit integer, we use the next number presentation algorithm 30 take or a shift operation, assuming n = 001xxx xxxxxxxx xxxxxxxx xxxxxxxx (x on behalf of the bit is a 0 or 1 we do not care)

The first right n | = n >>> 1, the operation is the number one post with n and n itself right conduct or operation, so you can achieve the highest level of n 1 is the right one is also set in close proximity 1

~~~ n 001xxx xxxxxxxx xxxxxxxx xxxxxxxxn >>> 1 0001xx xxxxxxxx xxxxxxxx xxxxxxxx | XXXXXXXX XXXXXXXX XXXXXXXX or operation result is 0011xx the most significant bit of n is immediately to the right of a 1 is also set to 1, so there are two consecutive high bits are 1 ~

Second right shift n | = n >>> 2

~~~ n 0011xx xxxxxxxx xxxxxxxx xxxxxxxxn >>> 2 000011 xxxxxxxx xxxxxxxx xxxxxxxx | XXXXXXXX XXXXXXXX XXXXXXXX or operation result is 001111 n, there are four consecutive high ~~~ 1

The third shift right n | = n >>> 4

~~~ n 001111 xxxxxxxx xxxxxxxx xxxxxxxxn >>> 4 000000 1111xxxx xxxxxxxx xxxxxxxx | XXXXXXXX XXXXXXXX 1111xxxx or operation result is 001111 n upper bits of consecutive 1 ~~~ 8

Fourth right n | = n >>> 8

~~~ n 001111 1111xxxx xxxxxxxx xxxxxxxxn >>> 8 000000 00001111 1111xxxx xxxxxxxx | 001111 11111111 1111xxxx xxxxxxxx or operation result is n, there are 16 consecutive high ~~~ 1

The fifth shift right n | n >>> 16

~~~ n 001111 11111111 1111xxxx xxxxxxxxn >>> 16 000000 00000000 00001111 11111111 | 00111111111111 1,111,111,111,111,111 or operation result of the high back n are set to 1 ~ 1

And n have the maximum capacity of the final comparison, if> = 2 ^ 30, to take the maximum capacity, if <30 ^ 2, n is +1 on the right, since the latter bits is 1, the equivalent +1 looking than this minimum number of integer power of 2

01111111111111 1,111,111,111,111,111, the value is greater than the value given by the smallest integer power of 2

Here we use a particular demo, such as cap = 18

cap 18

Our input 18, output 32, 18 is just larger than the smallest integer power of 2

If the cap itself is an integer power, output 2 Why?cap 16

By demonstrating visible, cap itself is the result of integer output power of 2 for itself

The above also left a question, is the first to cap -1, I explained that in order to avoid output is even, the result of the final calculation is 2 * cap, a waste of space, see the following demonstration

cap 16 is not reduced to a

By demonstrating that we can see is the 16 input, the calculation of the final result is 32, this will be a waste of space, so that the algorithm is cattle, the first of the cap made decremented by one

Question 2: When the HashMap to make hash key process, done anything special operations? Why do this?

First, we do know that HashMap put in operation when the key will be the first to do hash operation, located directly to the source location

~~~javastatic final int hash(Object key) {int h;return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);}~~~

Can be seen again when the hash key operation to make, it is performed (h = key.hashCode ()) ^ (h >>> 16)

~~~ hashCode original value: 1,011,010,101,001,100 1,001,010,111,011,111 the right 16-bit values: 00000000 00000000 1,011,010,101,001,100 the XOR: 1,011,010,101,001,100 0,010,000,010,010,011 ~~~

This operation is the value hashCode hashCode right 16-bit key value do XOR (different from 1, 0 is the same) , so that mixing with the calculated hash value of the high and low, so that the hash value can be generated more discrete

I need to explain here, by the previous description, we know the capacity of the array is the range [0, 2 ^ 30], this number is quite large, usually using an array capacity is still relatively small, such as the default size of 16, it is assumed three different key generated hashCoe values ​​are as follows:

19305951 00000001 00100110 10010101 11011111

128357855 00000111 10100110 10010101 11011111

38367 00000000 00000000 10010101 11011111

Three of them have in common the lower 16 bits is exactly the same, but different from the upper 16 bits, when the calculated index where they are in the array, through (n-1) & hash, where n is 16, n-1 = 15 , expressed as a binary 15

00000000 00000000 00000000 00001111

19305951,128357855,38367 are performed with 15 & computing and the following results

hash conflict

By discovering their results after computing the same, which means they will be put into a list or a red-black tree under the same index in clearly not in line with our expectations

So the hash value with its 16-bit right after the XOR operation, and then do the arithmetic and 15, see hash conflictHash resolve conflict

After 16 visible right after the XOR operation, and then calculate the corresponding array index, was assigned to a different bucket, solve the problem of a hash collision, the idea is to mix high and low are calculated to improve dispersible

to sum up

In fact, after HashMap There are many points worth of study, thoroughly understand the above two points, the ability to write code authors are saying is really cows, we work to learn from these ideas, I hope to explain, you can master these two knowledge point, do not know if I can leave a message or whisper

Welcome to public attention [No.] bleached teeth every day for the latest articles, we communicate together, progress together!

Guess you like

Origin juejin.im/post/5dc3a9c46fb9a04aa416d166