Why is the number of buckets in the hashMap more than 8 before being converted to a red-black tree

This question was asked during the interview. I thought of this question before in the learning process, but I didn’t pursue it carefully. When I was asked during the interview, I was just looking for a trade-off between time and space. Said it. The interviewer asked me to go back and check it carefully. Okay, I had to come back and check it again. It was embarrassing.
Analyze the source code directly:

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon shrinkage.
 */
static final int TREEIFY_THRESHOLD = 8;

The threshold defined above is 8.
There is a section of Implementation notes in HashMap. The author excerpted several important descriptions. The first section is as follows. The general meaning is that when the bin becomes very large, it will be converted into TreeNodes. The bin, its structure is similar to TreeMap, which is a red-black tree:

This map usually acts as a binned (bucketed) hash table, but
when bins get too large, they are transformed into bins of TreeNodes,
each structured similarly to those in java.util.TreeMap

Continuing to look down, TreeNodes occupies twice the space of ordinary Nodes, so only when the bin contains enough nodes will it be converted to TreeNodes, and whether there are enough nodes is determined by the value of TREEIFY_THRESHOLD. When the number of nodes in the bin decreases, it will be converted to a normal bin. And when we checked the source code, we found that when the length of the linked list reached 8 it was converted into a red-black tree, and when the length dropped to 6, it was converted into an ordinary bin.

This explains why it is not converted to TreeNodes at the beginning, but it takes a certain number of nodes to convert to TreeNodes. To put it bluntly, it is trade-off, the trade-off between space and time:

Because TreeNodes are about twice the size of regular nodes, we
use them only when bins contain enough nodes to warrant use
(see TREEIFY_THRESHOLD). And when they become too small (due to
removal or resizing) they are converted back to plain bins.  In
usages with well-distributed user hashCodes, tree bins are
rarely used.  Ideally, under random hashCodes, the frequency of
nodes in bins follows a Poisson distribution
(http://en.wikipedia.org/wiki/Poisson_distribution) with a
parameter of about 0.5 on average for the default resizing
threshold of 0.75, although with a large variance because of
resizing granularity. Ignoring variance, the expected
occurrences of list size k are (exp(-0.5)*pow(0.5, k)/factorial(k)). 
The first values are:
0:    0.60653066
1:    0.30326533
2:    0.07581633
3:    0.01263606
4:    0.00157952
5:    0.00015795
6:    0.00001316
7:    0.00000094
8:    0.00000006
more: less than 1 in ten million
  • This paragraph also said: When the hashCode discreteness is very good, the probability of using the tree bin is very small, because the data is evenly distributed in each bin, and the length of the linked list in the bin will hardly reach the threshold. However, under random hashCode, the discreteness may become worse, but the JDK cannot prevent users from implementing such a bad hash algorithm, which may lead to uneven data distribution. But ideally, the distribution frequency of nodes in all bins under the random hashCode algorithm will follow the Poisson distribution. We can see that the probability that the length of the linked list in a bin reaches 8 elements is 0.00000006, which is almost impossible. Therefore, the choice of 8 is not determined by pat buttock, but by probability statistics. It can be seen that every change and optimization of Java developed for 30 years is very rigorous and scientific.

  • I searched this question through a search engine and found many answers to the following (guessing is also forwarded to each other): The average search length of the red-black tree is log(n), if the length is 8, the average search length is log(8)=3, linked list The average search length is n/2. When the length is 8, the average search length is 8/2=4, which is necessary to convert to a tree; if the length of the linked list is less than or equal to 6, 6/2=3, and log (6)=2.6, although the speed is also very fast, the time to transform into tree structure and spanning tree is not too short.
    The author thinks that this answer is not rigorous enough: 3 compared to 4 is necessary to convert, and 2.6 compared to 3 is not necessary to convert? At least I dare not agree with this view.

Guess you like

Origin blog.csdn.net/zl1107604962/article/details/108682208