hashMap source code reading and note-taking process documentation Debug
hashmap source probably looking for a day and a half in which there are many details, this out of some of the details of the interview and often asked to talk about their own understanding, the article will eventually posted some of the better blog. Source with specific comments would be on my github. This article will not have much source content
data structure
HashMap1.8 in
- Underlying data structure: using an array + + black tree list
HashMap1.7 in
- Underlying data structure: using an array list +
Array + list the benefits of adding the effect of deleting time high complexity and balance an array of ways to reduce the high degree of complex query time list
Red-black tree + list the benefits: balancing space and time, black tree search efficiency optimization can achieve O (logn) level, but each node needs to store a pointer around in the tree node hashmap also store the next pointer to the list the length of time of less than 6 into a list structure.
Head and tail interpolation interpolation
Why is there a deadlock under high concurrency environment hashmap
The initial capacity of the array is set to 16 reasons?
HashMap initial capacity of the array is set to 16, the reason for the array by calculating hash value of the node (also called a hash bucket) such that when the target node hash distribution.
An example will be described by
Hash value hash bucket in obtaining the length of the array subscripts modulo operation, to ensure the array index in length. Source code using a logical operation reduces modulo time overhead.
first = tab[(n - 1) & hash]
复制代码
First you need to understand hash value int type, 32 bits. Suppose we take calculated hash to the previous 1010100111 0 check for hash bucket to 10 and 16 to calculate different results.
Decimal turn Binary 10-> 1010B 16-> 10000B
1010B-1=1001B 10000B-1=1111B
1010100111 1010100111
& 1001 1111
0001 0111
复制代码
The calculated index is different from the above no problem, but if additional element to be put into hashMap in his hash code 1010100011
1010100011 1010100011
& 1001 1111
0001 0011
复制代码
We can visually see that the two hash codes with different final four was placed in the same subscript, the disadvantage is that the list becomes very long in the length of a particular index in the array. Search efficiency becomes low, so that the length of the array 16 can ensure that each was linked list, hash value for each node is the same. This is typical of the use of space for time thinking. Similarly 32, 64, 128 as well. So we can see that the size of the array initialization above comments.
/**
* The default initial capacity - MUST be a power of two.
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
复制代码
From the initial size of 16 hashmap we understand the mysteries of the size of the array capacity.
If you want to know how the HashMap is to ensure that the capacity of the array of size N power of 2 you can look tableSizeFor () function
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
复制代码
Optimization in java8 hashmap -> Optimization Function disturbance
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
复制代码
HashMap get in the array elements of the underlying calls getNode () method which adds more than a hash () function to enter the hash () function can see the internal code is also very simple
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
复制代码
After calling hashCode key here itself () method, with his sixteen itself be a high XOR operation. The purpose of this operation makes hashCode code is not just about low sixteen, but also to have high sixteen is also a relationship, increasing the complexity of avoiding inefficient hashcode hash function hash collision resulted in word only relevant in the post, so that the calculated the hash code more uniform.
In comparison with JDK1.7 find the same idea of both, but different implementations.
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
复制代码
Methods to resolve the HashMap collision
Simply put values into the array, the array is empty directly into, there are elements of it into the back of the element. Specifically linked list or behind the back of the red-black tree, not Benpian focus.
Note that here only when the size of the array and a single chain length over 64> = 8 will be called when the function list is to convert the red-black tree.
String class hashcode () function calculates Why did you choose 31 as the multiplier factor?
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
复制代码
First, using the key comes hashcode () method then adding perturbation function h >>> 16 bit operation and then a
String class hashcode method using a multiplier 31 for two reasons as the quality factor
- 31 * i = (2 >> 5) * i -i jvm some optimization may be performed on him
- Use 31 hash operations when excessive hash collision rate of prime numbers 2 and 101 issues prime hash value calculation overflow does not occur. Other relatively prime with 1729, etc., a calculation using the hash performed when hashcode, and even more uniform distribution.
String select 31 as the calculation result based on the above two classes of hashcode reasons.
The following annotated source code and content will be a supplement in two days