Read the HashMap source code easily
- Preface
- HashMap source code analysis
- to sum up
Preface
HashMap
For each study Java
people who are familiar can not be familiar, yet is such a familiar thing, really dig into the source code level, there are many parts of the value of learning and thinking, now let us work together to explore HashMap
the source code.
HashMap source code analysis
HashMap
Based on a hash table, and implements Map
an interface that the key-value
key-value pair data structure stored therein key
and value
are allowed null
values in HashMap
, the order is not guaranteed, nor security thread.
Data storage in HashMap
In HashMap
, each put
operation will be on key
hashing, based on the results and then hashed index obtained through positioning calculation within a specified range, this index is the current key
value of the storage location, since it is hashed, then certainly there will be a hash collision, which in jdk1.7
previous versions and each index is stored in a linked list, and the jdk1.8
version and beyond, then this is optimized when the chain length is longer than 8
the time, will red-black tree into memory, when the red-black tree elements from greater than or equal 8
decrease or less 6
, the degradation will be stored as a linked list again from the red-black tree.
The figure is jdk1.8
a HashMap
storage structure schematic view (each referred to as the target position 桶
):
In HashMap
the data storage is achieved by an array of red-black tree + + list ( jdk1.8
), and in earlier versions of the array using only + linked list to be stored.
Why is it recommended to specify the size when initializing the HashMap
HashMap
Initialization, we usually recommend estimated about possible size, and then construct HashMap
the specified capacity when the object, this is why? To answer this question, let us look at HashMap
how to initialize.
When we do not specify any parameters to create the figure is HashMap
subject:
Load factor
You can see that there is a default DEFAULT_LOAD_FACTOR
(load factor), and this value is by default 0.75
. When the effect of the load factor is HashMap
used in the size of the capacity of the total capacity 0.75
, the expansion will be performed automatically. Then you can see from the above code, when we do not specify the size, HashMap
and does not initialize a capacity, then we can venture to guess when we call put
upon the current method will certainly be judged HashMap
whether initialized, if not initialized, it will Initialize first.
HashMap default capacity
put
The method determines the current HashMap
if is initialized, it is not initialized, it will call the resize
method is initialized, resize
the method used to initialize not only, but also for the expansion, which is the initialization part of the main portion of FIG red box:
Can be seen, initialization HashMap
time, mainly initialized 2
variables: one newCap
representing the current HashMap
in the number of barrels, an index of the array represents a bucket; one is newThr
mainly used to indicate the expansion threshold, because any time we It is impossible to wait until the buckets are all used up to expand. Therefore, a threshold must be set. After the threshold is reached, the expansion starts. The threshold is the total capacity multiplied by the load factor.
Above we know that the default load factor 0.75
, and the default size of the bucket 16
, so that is when we initialize HashMap
time without specifying the size of the bucket when used 12
during expansion automatically (how the expansion of our analysis in the back). Expansion will involve migrating the old data, more consumption of performance, it can be estimated that in HashMap
the total element needs to be stored, we recommend to specify in advance HashMap
the capacity of size, to avoid the expansion operation.
PS: Note that, in general we say HashMap
in capacity are referring to the number of buckets and each bucket can put the number of elements depends on the memory, so capacity is not to say that 16
you can only store 16
a key
value.
What is the maximum capacity of HashMap
When we manually specify the capacity of the initialization HashMap
time, always call the following method to initialize:
See 453
the line, when the capacity is greater than our designated MAXIMUM_CAPACITY
time, it will be assigned MAXIMUM_CAPACITY
, and this MAXIMUM_CAPACITY
number is it?
We see the figure above, MAXIMUM_CAPACITY
is 30 th power, and int
the range is 31 th power minus 12, which would mean that the scope to reduce it? See the note above can know where to ensure HashMap
the capacity 2
of the N-th power, and int
the maximum positive range of 31 types of power is less than 2, it took 30 to a power of two.
We go back to the previous constructor with parameters, and finally calls a tableSizeFor
method, the role of this method is to adjust HashMap
the capacity size:
If you don’t understand bitwise operations, you may not understand exactly what this is doing. In fact, one thing is done in this method, which is to adjust the size of the specified capacity we passed in to the power of 2 , So at the very beginning, we have to reduce the volume we passed in 1
, just for unified adjustment.
Let's take a simple example to explain the above method. Bit operation involves binary, so if the capacity we pass in is one 5
, then it is converted to binary 0000 0000 0000 0000 0000 0000 0000 0101
. At this time, we must ensure that this number is 2 to the N power , then the easiest way is to put our current binary number of the leftmost 1
, has been to the far right, all the bits are 1
, then the result is to obtain the corresponding N-th power minus 12, such as we pass 5
coming to ensure that the N-th power of 2, then it is surely needs to be adjusted for the 3 power of two, that is: 8
this time me with you need to do is put back 3
position 101
adjustment is 111
, you can get 3 times a power of two minus one , The final total capacity plus 1 can be adjusted to the Nth power of 2.
Or in 5
, for example, an unsigned right 1
position to give 0000 0000 0000 0000 0000 0000 0000 0010
, then the original value 0000 0000 0000 0000 0000 0000 0000 0101
to perform |
operation (only on both sides at the same time 0
, the result will be to 0
), you can get results 0000 0000 0000 0000 0000 0000 0000 0111
, that is, to become the second 1
, this time regardless of how much further to the right place, the right number of times, the results will not change, to ensure after three 1
, and later also turn to the right, is to ensure that when the first number of 31
bits 1
, you can ensure the highest level of addition outside the 31
bit all 1
.
Here, we should doubt the will, why so much trouble to ensure HashMap
the capacity, that is, the number of barrels of N 2 to the power of it?
Why the capacity of HashMap should be 2 to the power of N
The reason to make sure that HashMap
the capacity of N 2 to the power of reason is very simple, is to try to avoid the hash result in uneven distribution of data in each bucket uneven distribution, which appears excessive certain elements of the bucket, Affect query efficiency.
We continue to look at the put
method, the red line in the lower part of FIG algorithm indexing position is calculated, that is, by the current array ( HashMap
bottom layer is the use of an Node
array to store elements) 长度 - 1
re, and hash
a value &
obtained by the calculation:
&
It features operation is only two numbers are 1
the results obtained is 1
, if then n-1
converted into binary contains a lot of 0
, like 1000
, this time again and hash
values to carry out &
operations, most only 1
in this position to be effective, all the rest position 0
, Which is equivalent to invalid. At this time, the probability of hash collision will be greatly improved. And if replaced by a 1111
longer and hash
value &
calculation, then the four were effectively involved in the operation, greatly reducing the probability of a hash collision, which is why the best time to start initialization, passes through a series of |
operations to be first a 1
later position of all the elements of all modifications to 1
reason.
Talk about the hash operation in HashMap
The above we talked about a calculated key
value when used in a location which eventually landed hash
value, then this hash
value is how to get it?
The figure is HashMap
calculated hash
values Method:
We can see that this calculation method is very special, it is not just simply by a hashCode
method for obtaining, at the same time but also hashCode
the resulting unsigned right shift 16
and then after bit XOR, to give the final result.
The goal is to be the high 16
bit is also involved in computing, to further avoid hash collision. Because HashMap
always in the power of N 2 capacity, so if with only low 16
bit involved in computing, so a large part of the case to its low 16
position is 1
so high 16
-bit computing can also participate to some extent, to avoid hash collisions. The reason behind the use of exclusive-or operation without the use of &
and |
the reason is that if &
operation would bias the results 1
, using the |
operation would bias the results 0
, ^
keep the original characteristics of the operation will make the results better.
put element process
put
Process already mentioned above, previous methods, if HashMap
not initialized, will initialize, and then determines whether the current key
position value if there is an element, if no element, directly into the deposit, if there are elements of this branch will go below:
The else
branches are mainly 4
logical:
- Determine the current
key
and originalkey
are the same, if the same direct return. - If the current
key
and previouskey
are not equal, it is judged whether the current element is stored in the tubTreeNode
node, if it is currently showing black trees, red-black tree in accordance with the stored algorithm. - If the current
key
and previouskey
are not equal, and the current bucket is stored in a linked list, each node, traversing thenext
node is empty, empty directly into the current list element, it is not empty two first determinekey
whether the same , is equal to the direct return, not equal to continue determinationnext
node, untilkey
equal ornext
node is empty. - After inserting the linked list, if the length of the current linked list is greater than the
TREEIFY_THRESHOLD
default is8
, the linked list will be switched to red-black tree storage.
After processing, there is a final judgment is to determine whether or not to overwrite the old elements, if e != null
, then the current key
value already exists then determine onlyIfAbsent
variable which is the default false
, represents the old value is overwritten, so the next operation will be covered, and then Return the old value.
Expansion
When HashMap
after a large quantity of data stored in the threshold value (the current number of buckets load factor *), it will trigger the expansion operation:
So let's take a look at resize
methods:
The first red line is determined whether the current has reached a capacity MAXIMUM_CAPACITY
, this value is mentioned in front of a power of 2 is 30, this value is reached, the expansion will modify the threshold value int
maximum value storage type, that is no longer Set out to expand.
The second red line is the expansion, expansion of the size of the capacity is left of the old 1
place, which is expanded to the original 2
times. Of course, if the expanded capacity does not meet the conditions in the second red box, it will be set in the third red box.
How to deal with the original data after expansion
Expansion is very simple, the key is how to deal with the original data? Without looking at the code, we can roughly sort out the scenarios for migrating data, and the scenarios without data are not considered:
- The current bucket position is only itself, that is, there are no other elements below.
- There are elements below the current bucket position, and it is a linked list structure.
- There are elements below the current bucket position, and they are red-black trees.
Let's look at the source of the resize
data migration part ways:
The red box part is easier to understand. The first thing is to see if the element in the current bucket is lonely. If it is, just recalculate the subscript and assign it to the new data. If it is a red-black tree, the reorganization will be broken up. This part will be omitted for the time being. before, the last else
part is the part of the processing chain, then let's look at the focus of the processing chain.
Linked list data processing
Processing chain has a core idea: the list under standard element either remains the same, either in the original, based on a plus oldCap
size .
The source code of the complete part of the data processing of the linked list is shown in the following figure:
The key condition is that e.hash & oldCap
why this result is equal to 0
it represents the position of elements has not changed it?
Before explaining the problem, it is necessary to recall tableSizeFor
method, which will be n-1
adjusted to a similar 00001111
data structure, for example: initial capacity of such 16
a length n-1
that 01111
, and n
that 10000
, if e.hash & oldCap ==0
it means that hash
the first value of 5
bits are 0
, 10000
expansion after is obtained 100000
, corresponding to n-1
that 011111
, and the original old n-1
at the differences is the first 5
bit (the first 6
bit is 0
not affect the results), so when e.hash & oldCap==0
it shows the first 5
place there is no effect on the results, then that position will not change, and If e.hash & oldCap !=0
, it shows the first 5
digit affects the result, and the first 5
bit if the result is calculated 1
, the target position is obtained just more of a oldCap
size, ie 16
. The same is true for the expansion of other positions, so as long as the e.hash & oldCap==0
subscript position is unchanged, and if it is not equal 0
, one more subscript position should be added oldCap
.
After the final loop completes the node, the processing source code is as follows:
Empathy 32
is 100000
, which explains one thing that only need to focus on the most significant bit 1
can be, because only the median and e.hash
participation &
during operation likely to get 1
,
to sum up
This paper analyzes HashMap
how is initialized, and put
the method is how to set up, also introduced in order to prevent hash collisions, the HashMap
capacity is set to N power of two, finally introduced HashMap
in the expansion.