Gold three silver four help interview-easy to understand the source code of HashMap

Preface

HashMapFor each study Javapeople who are familiar can not be familiar, yet is such a familiar thing, really dig into the source code level, there are many parts of the value of learning and thinking, now let us work together to explore HashMapthe source code.

HashMap source code analysis

HashMapBased on a hash table, and implements Mapan interface that the key-valuekey-value pair data structure stored therein keyand valueare allowed nullvalues in HashMap, the order is not guaranteed, nor security thread.

Data storage in HashMap

In HashMap, each putoperation will be on keyhashing, based on the results and then hashed index obtained through positioning calculation within a specified range, this index is the current keyvalue of the storage location, since it is hashed, then certainly there will be a hash collision, which in jdk1.7previous versions and each index is stored in a linked list, and the jdk1.8version and beyond, then this is optimized when the chain length is longer than 8the time, will red-black tree into memory, when the red-black tree elements from greater than or equal 8decrease or less 6, the degradation will be stored as a linked list again from the red-black tree.

The figure is jdk1.8a HashMapstorage structure schematic view (each referred to as the target position ):

Insert picture description here

In HashMapthe data storage is achieved by an array of red-black tree + + list ( jdk1.8), and in earlier versions of the array using only + linked list to be stored.

Why is it recommended to specify the size when initializing the HashMap

HashMapInitialization, we usually recommend estimated about possible size, and then construct HashMapthe specified capacity when the object, this is why? To answer this question, let us look at HashMaphow to initialize.

When we do not specify any parameters to create the figure is HashMapsubject:

Insert picture description here

Load factor

You can see that there is a default DEFAULT_LOAD_FACTOR(load factor), and this value is by default 0.75. When the effect of the load factor is HashMapused in the size of the capacity of the total capacity 0.75, the expansion will be performed automatically. Then you can see from the above code, when we do not specify the size, HashMapand does not initialize a capacity, then we can venture to guess when we call putupon the current method will certainly be judged HashMapwhether initialized, if not initialized, it will Initialize first.

HashMap default capacity

putThe method determines the current HashMapif is initialized, it is not initialized, it will call the resizemethod is initialized, resizethe method used to initialize not only, but also for the expansion, which is the initialization part of the main portion of FIG red box:

Insert picture description here

Can be seen, initialization HashMaptime, mainly initialized 2variables: one newCaprepresenting the current HashMapin the number of barrels, an index of the array represents a bucket; one is newThrmainly used to indicate the expansion threshold, because any time we It is impossible to wait until the buckets are all used up to expand. Therefore, a threshold must be set. After the threshold is reached, the expansion starts. The threshold is the total capacity multiplied by the load factor.

Above we know that the default load factor 0.75, and the default size of the bucket 16, so that is when we initialize HashMaptime without specifying the size of the bucket when used 12during expansion automatically (how the expansion of our analysis in the back). Expansion will involve migrating the old data, more consumption of performance, it can be estimated that in HashMapthe total element needs to be stored, we recommend to specify in advance HashMapthe capacity of size, to avoid the expansion operation.

PS: Note that, in general we say HashMapin capacity are referring to the number of buckets and each bucket can put the number of elements depends on the memory, so capacity is not to say that 16you can only store 16a keyvalue.

What is the maximum capacity of HashMap

When we manually specify the capacity of the initialization HashMaptime, always call the following method to initialize:

Insert picture description here

See 453the line, when the capacity is greater than our designated MAXIMUM_CAPACITYtime, it will be assigned MAXIMUM_CAPACITY, and this MAXIMUM_CAPACITYnumber is it?

Insert picture description here

We see the figure above, MAXIMUM_CAPACITYis 30 th power, and intthe range is 31 th power minus 12, which would mean that the scope to reduce it? See the note above can know where to ensure HashMapthe capacity 2of the N-th power, and intthe maximum positive range of 31 types of power is less than 2, it took 30 to a power of two.

We go back to the previous constructor with parameters, and finally calls a tableSizeFormethod, the role of this method is to adjust HashMapthe capacity size:

Insert picture description here

If you don’t understand bitwise operations, you may not understand exactly what this is doing. In fact, one thing is done in this method, which is to adjust the size of the specified capacity we passed in to the power of 2 , So at the very beginning, we have to reduce the volume we passed in 1, just for unified adjustment.

Let's take a simple example to explain the above method. Bit operation involves binary, so if the capacity we pass in is one 5, then it is converted to binary 0000 0000 0000 0000 0000 0000 0000 0101. At this time, we must ensure that this number is 2 to the N power , then the easiest way is to put our current binary number of the leftmost 1, has been to the far right, all the bits are 1, then the result is to obtain the corresponding N-th power minus 12, such as we pass 5coming to ensure that the N-th power of 2, then it is surely needs to be adjusted for the 3 power of two, that is: 8this time me with you need to do is put back 3position 101adjustment is 111, you can get 3 times a power of two minus one , The final total capacity plus 1 can be adjusted to the Nth power of 2.

Or in 5, for example, an unsigned right 1position to give 0000 0000 0000 0000 0000 0000 0000 0010, then the original value 0000 0000 0000 0000 0000 0000 0000 0101to perform |operation (only on both sides at the same time 0, the result will be to 0), you can get results 0000 0000 0000 0000 0000 0000 0000 0111, that is, to become the second 1, this time regardless of how much further to the right place, the right number of times, the results will not change, to ensure after three 1, and later also turn to the right, is to ensure that when the first number of 31bits 1, you can ensure the highest level of addition outside the 31bit all 1.

Here, we should doubt the will, why so much trouble to ensure HashMapthe capacity, that is, the number of barrels of N 2 to the power of it?

Why the capacity of HashMap should be 2 to the power of N

The reason to make sure that HashMapthe capacity of N 2 to the power of reason is very simple, is to try to avoid the hash result in uneven distribution of data in each bucket uneven distribution, which appears excessive certain elements of the bucket, Affect query efficiency.

We continue to look at the putmethod, the red line in the lower part of FIG algorithm indexing position is calculated, that is, by the current array ( HashMapbottom layer is the use of an Nodearray to store elements) 长度 - 1re, and hasha value &obtained by the calculation:

Insert picture description here

&It features operation is only two numbers are 1the results obtained is 1, if then n-1converted into binary contains a lot of 0, like 1000, this time again and hashvalues to carry out &operations, most only 1in this position to be effective, all the rest position 0, Which is equivalent to invalid. At this time, the probability of hash collision will be greatly improved. And if replaced by a 1111longer and hashvalue &calculation, then the four were effectively involved in the operation, greatly reducing the probability of a hash collision, which is why the best time to start initialization, passes through a series of |operations to be first a 1later position of all the elements of all modifications to 1reason.

Talk about the hash operation in HashMap

The above we talked about a calculated keyvalue when used in a location which eventually landed hashvalue, then this hashvalue is how to get it?

The figure is HashMapcalculated hashvalues Method:

Insert picture description here

We can see that this calculation method is very special, it is not just simply by a hashCodemethod for obtaining, at the same time but also hashCodethe resulting unsigned right shift 16and then after bit XOR, to give the final result.

The goal is to be the high 16bit is also involved in computing, to further avoid hash collision. Because HashMapalways in the power of N 2 capacity, so if with only low 16bit involved in computing, so a large part of the case to its low 16position is 1so high 16-bit computing can also participate to some extent, to avoid hash collisions. The reason behind the use of exclusive-or operation without the use of &and |the reason is that if &operation would bias the results 1, using the |operation would bias the results 0, ^keep the original characteristics of the operation will make the results better.

put element process

putProcess already mentioned above, previous methods, if HashMapnot initialized, will initialize, and then determines whether the current keyposition value if there is an element, if no element, directly into the deposit, if there are elements of this branch will go below:

Insert picture description here

The elsebranches are mainly 4logical:

  1. Determine the current keyand original keyare the same, if the same direct return.
  2. If the current keyand previous keyare not equal, it is judged whether the current element is stored in the tub TreeNodenode, if it is currently showing black trees, red-black tree in accordance with the stored algorithm.
  3. If the current keyand previous keyare not equal, and the current bucket is stored in a linked list, each node, traversing the nextnode is empty, empty directly into the current list element, it is not empty two first determine keywhether the same , is equal to the direct return, not equal to continue determination nextnode, until keyequal or nextnode is empty.
  4. After inserting the linked list, if the length of the current linked list is greater than the TREEIFY_THRESHOLDdefault is 8, the linked list will be switched to red-black tree storage.

After processing, there is a final judgment is to determine whether or not to overwrite the old elements, if e != null, then the current keyvalue already exists then determine onlyIfAbsentvariable which is the default false, represents the old value is overwritten, so the next operation will be covered, and then Return the old value.

Insert picture description here

Expansion

When HashMapafter a large quantity of data stored in the threshold value (the current number of buckets load factor *), it will trigger the expansion operation:

Insert picture description here

So let's take a look at resizemethods:

Insert picture description here

The first red line is determined whether the current has reached a capacity MAXIMUM_CAPACITY, this value is mentioned in front of a power of 2 is 30, this value is reached, the expansion will modify the threshold value intmaximum value storage type, that is no longer Set out to expand.

The second red line is the expansion, expansion of the size of the capacity is left of the old 1place, which is expanded to the original 2times. Of course, if the expanded capacity does not meet the conditions in the second red box, it will be set in the third red box.

How to deal with the original data after expansion

Expansion is very simple, the key is how to deal with the original data? Without looking at the code, we can roughly sort out the scenarios for migrating data, and the scenarios without data are not considered:

  1. The current bucket position is only itself, that is, there are no other elements below.
  2. There are elements below the current bucket position, and it is a linked list structure.
  3. There are elements below the current bucket position, and they are red-black trees.

Let's look at the source of the resizedata migration part ways:

Insert picture description here

The red box part is easier to understand. The first thing is to see if the element in the current bucket is lonely. If it is, just recalculate the subscript and assign it to the new data. If it is a red-black tree, the reorganization will be broken up. This part will be omitted for the time being. before, the last elsepart is the part of the processing chain, then let's look at the focus of the processing chain.

Linked list data processing

Processing chain has a core idea: the list under standard element either remains the same, either in the original, based on a plus oldCapsize .

The source code of the complete part of the data processing of the linked list is shown in the following figure:

Insert picture description here

The key condition is that e.hash & oldCapwhy this result is equal to 0it represents the position of elements has not changed it?

Before explaining the problem, it is necessary to recall tableSizeFormethod, which will be n-1adjusted to a similar 00001111data structure, for example: initial capacity of such 16a length n-1that 01111, and nthat 10000, if e.hash & oldCap ==0it means that hashthe first value of 5bits are 0, 10000expansion after is obtained 100000, corresponding to n-1that 011111, and the original old n-1at the differences is the first 5bit (the first 6bit is 0not affect the results), so when e.hash & oldCap==0it shows the first 5place there is no effect on the results, then that position will not change, and If e.hash & oldCap !=0, it shows the first 5digit affects the result, and the first 5bit if the result is calculated 1, the target position is obtained just more of a oldCapsize, ie 16. The same is true for the expansion of other positions, so as long as the e.hash & oldCap==0subscript position is unchanged, and if it is not equal 0, one more subscript position should be added oldCap.

After the final loop completes the node, the processing source code is as follows:

Insert picture description here

Empathy 32is 100000, which explains one thing that only need to focus on the most significant bit 1can be, because only the median and e.hashparticipation &during operation likely to get 1,

to sum up

This paper analyzes HashMaphow is initialized, and putthe method is how to set up, also introduced in order to prevent hash collisions, the HashMapcapacity is set to N power of two, finally introduced HashMapin the expansion.

Guess you like

Origin blog.csdn.net/zwx900102/article/details/113849103