Features: The time complexity of adding, deleting and checking are all O (1).
Storage KEY can be Null and unique, Value can be NULL; Capacity expansion (initial size is 16 (power of 2), load factor is 0.75, capacity expansion is twice the original size);
Concept: Disturbance function (hash function): Use the hash function to avoid the use of poor hashCode (); thus reducing the occurrence of hash collisions.
JDK1.7
Data structure: Array and linked list (one-way); thread unsafe, multi-threaded endless loop will occur;
The structure of the array plus linked list: the array composed of the internal Entry in HashMap; calculating the array subscript: the hash () algorithm implemented by itself, and performing the XOR operation and taking the modulus to obtain the corresponding array subscript position; writing the hash algorithm The advantage is that the occurrence of hash collisions is reduced.
The code is as follows: the hashCode value of the key, 9 times perturbation processing = 4 times bit operation + 5 times XOR.
final int hash(Object k) {
int h = hashSeed;
if (0 != h && k instanceof String) {
return sun.misc.Hashing.stringHash32((String) k);
}
h ^= k.hashCode();
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
The linked list adopts the head insertion method . During the expansion process, in the case of multi-threaded concurrency: the transfer () method, loops the old linked list, and then re-points the new list header. One thread suspends one thread to continue execution, which will eventually lead to two threads Inserting data points to each other to form a circular linked list. When getting the data in the array, it will cause an endless loop.
void transfer(Entry[] newTable, boolean rehash) {
int newCapacity = newTable.length;
for (Entry<K,V> e : table) {
while(null != e) {
// 如多线程执行时,某一线程执行到该步骤挂起,则会导致上述所说的环形链表的产生。
Entry<K,V> next = e.next;
if (rehash) {
e.hash = null == e.key ? 0 : hash(e.key);
}
int i = indexFor(e.hash, newCapacity);
e.next = newTable[i];
newTable[i] = e;
e = next;
}
}
}
In addition to the dead loop caused by the head interpolation method, the linked list structure has a serious problem. When a large number of hash collisions occur, a large number of Nodes are stored in an Entry. Due to the one-way linked list, the data search (sequential search) greatly affects the search. effectiveness.
JDK1.8
Data structure: Array plus linked list plus red-black tree; thread unsafe, tail interpolation will not fall into an endless loop;
The hashCode value of the key, 2 perturbation processing = 1 bit operation + 1 XOR.
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
The linked list uses tail interpolation, which will not cause an endless loop, but in the case of multi-threading, it may cause some data loss.
The threshold for the conversion of the linked list into a red-black tree is 8:
/**
* The bin count threshold for using a tree rather than list for a
* bin. Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
*/
static final int TREEIFY_THRESHOLD = 8;
/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
*/
static final int UNTREEIFY_THRESHOLD = 6;
...
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
...
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash); //转化为红黑树
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
...
}
The question is why when it is less than 7, it is converted into a linked list, and if it is greater than 7, it is converted into a red-black tree.
First of all, the red-black tree is not necessarily more efficient than the linked list. When there are many nodes, the red-black tree is more efficient; choose 6 and 8 (if the linked list is less than or equal to 6 trees, restore to a linked list, and greater than or equal to 8 to a tree), There is a difference of 7 in the middle to effectively prevent frequent conversion of linked lists and trees; the frequency of nodes in the container distributed in the hash bucket follows the Poisson distribution, and the probability that the length of the bucket exceeds 8 is very very small.