Initial capacity and load factor in HashMap

By checking the initial default capacity of the underlying source code of HashMap is 16, the load factor is 0.75.

Insert picture description here
Insert picture description here
HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is just the capacity of the hash table when it is created. The load factor is a measure of how full the hash table can be before its capacity is automatically expanded. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table needs to be expanded and rehashed (that is, the internal data structure is rebuilt), and the expanded hash table will have twice the size Original capacity.

In order to reduce the probability of conflict, when the length of the hashMap array reaches a critical value, expansion will be triggered, and all elements will be rehashed into the expanded container, so rehash is a very time-consuming operation.

This critical value is determined by the load factor and the capacity of the current container: DEFAULT_INITIAL_CAPACITY*DEFAULT_LOAD_FACTOR, that is, when 16x0.75=12 by default, the expansion operation will be triggered.

Through the above understanding, then there will be a question that is often asked in interviews: Why is the initial load factor 0.75?
  • Load factor is too high, such as 1, which will reduce space overhead and improve space utilization, but at the same time will increase the cost of query time
  • The load factor is too low, such as 0.5, although it can reduce the query time, but the space utilization is very low, and the number of rehash operations is increased at the same time

When setting the initial capacity, the number of entries required in the map and its load factor should be considered to minimize the number of rehash operations. Therefore, when using HashMap, it is generally recommended to set the initial capacity according to the estimated value to reduce expansion operations.

Therefore, choosing 0.75 as the default loading factor is completely a compromise between time and space costs. A
Insert picture description here
simple translation is that under ideal circumstances, random hash codes are used, and the frequency of node appearance in the hash bucket follows the Poisson distribution. , And a comparison table of the number and probability of the elements in the bucket is also given.

It can be seen from the above table that when the number of elements in the bucket reaches 8, the probability has become very small, that is to say, using 0.75 as the loading factor, it is almost impossible for the length of the linked list of each collision location to exceed 8.

To summarize: The compromise between improving space utilization and checking query costs is mainly Poisson distribution. If you choose 0.75, the collision will be minimal.

Guess you like

Origin blog.csdn.net/weixin_44723496/article/details/112387738