Foreword
Today, fishing in troubled waters looked at HashMap
the source code, think of the great God experienced interviewer asks students to interview Redis 字典
and HashMap
hash process any different. . . Honestly, also design and implementation of Redis (really recommend) read, but not exactly describe it, so write this article.
Note : Since this article emphasis on the hash process, part of the source code to see the great God who lucid narrative it ~
Hash function (hash algorithm)
-
Hash function known as hashing algorithm, the hash function is a method to create a small digital "fingerprint" from any kind of data, and this "fingerprint" is the hash value .
-
Applications hash function is very broad, such as data protection, to ensure real transfer of information, such as a hash table. This paper discusses the application is on the hash table.
-
Of course, we hope that the hash function ensures that each key corresponds to a "fingerprint" that is a hash value , the so-called universal hashing , but due to performance, application scenarios and other considerations, can not be accepted too many hash collisions .
-
Hash collision **: ** the input and output of the hash function is not unique corresponding relationship, such as a hash value of the hash function input A, B is obtained C.
-
Common hash functions:
Direct-addressable digital analysis middle-square method Method I stay folded random number addition method
- Surely we all know HashMap, Redis dictionary class scene, will be selected based on the I stay optimized, as the hash function. Will not repeat here the concept of each method, to please take a look here .
Hash collision (hash collision) solution
hash?->散列算法的选择->散列冲突怎么解决
Presumably this is what most colleagues mindset of. So, we look at the hash algorithm to solve the major conflicts What?
Open-addressable
- Once the conflict, went looking for the next empty hash address, company follows the most simple algorithms. For chestnut, set to 12, 26, 37 are sequentially inserted, (26 + 1) = 12% (37 + 1) 12%, a hash collision occurs, thus again (37 + 2) = 12 4%, hashing value is not the same, the conflict will be solved.
Re-hashing
- While preparing a plurality of hash functions, when the first hash function conflicts can have a spare hash function calculation.
Chain address method
- In addition to use to give I stay hash value, the hash collisions if the linked list node is inserted one by one collision, a structure is formed as shown below. The following are examples, the scene can be seen 48% 12% 12 = 12 = 0, forming a linked list.
Public Law overflow area
- The use of additional public storage element values conflict of space to store the hash.
HashMap
HashMap hash algorithm
Episode LOAD_FACTOR
(load factor)
-
We all know that
HashMap
the defaultLOAD_FACTOR
is0.75
, what it does is it? Followed by their source to search for traces of ~ -
new
AHashMap
source code annotation tells uscapacity
of 16,load_factor
0.75.
/**
* Constructs an empty <tt>HashMap</tt> with the default initial capacity
* (16) and the default load factor (0.75).
*/
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
复制代码
- With the
put
elements, we must carry out the expansion, see theresize()
function, refers to the part of the iconic posted.
else { // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
复制代码
- We can see, the threshold value by
capacity
multiplying theload_factor
results of that16 * 0.75 = 12
.HashMap
The expansion element more than a threshold will be appreciated that as the threshold value with respect to the small capacity, reduces the probability of a hash collision, because the probability of a hash collision into 16 elements, in the same hash function, the large chance of a hash collision probability is smaller than 12 elements .
Hash function and analytic formula
- HashMap hash algorithm similar to other I stay, but not by
MOD
calculation, but with a bit operation, assuming an input value, the hash function is a specific formula is as follows:
f(key) = hash(key) & (table.length - 1)
hash(key) = (h = key.hashCode()) ^ (h >>> 16)
复制代码
hash(key) & (table.length - 1)
Istable.length
optimized version to take over, in fact, almost the role, it is based on I stay removed. Due toHashMap
the characteristics of each expansiontable.length
will be the , so more significant bit arithmetic efficiency.>>>
Unsigned right shift operator, we know thathashCode()
the range is very wide, the possibility of conflict itself is very small, but with thetable.length - 1
the probability becomes large, becausetable.length
a smaller value. That's why the use of>>> 16
reason,hashCode()
of both high and low forf(key)
a certain clout, more even distribution, the probability of a hash collision on the smaller.
HashMap hash conflict resolution
HashMap
The obvious solution is to hash collision chain address law , in fact, can be seen from the structure.- From
HashMap
theresize()
process can also be seen, see the section below the source. . . Comment on.
if (oldTab != null) {
// 遍历旧数组上的节点
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
// 当前链表只有一个节点(没有散列冲突的情况)
if (e.next == null)
// 通过散列算法计算存放位置并放入
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
// 红黑树去了。。。
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
// 低位链表、高位链表
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
// 遍历发生散列冲突的链表
next = e.next;
// hash值小于旧数组容量 放入低位链表
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
// hash值大于等于旧数组大小 放入高位链表
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
// 低位链表放在原来index下
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
// 高位链表放在原来index + 旧数组大小
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
复制代码
Redis dictionary
Introduction to Data Structure
And HashMap
about the same place
First, a brief dictionary of Redis data structures will be briefly described, are my college days cast a shadow of C
language.
typedef struct dictht {
// 哈希表数组
dictEntry **table;
// 哈希表大小
unsigned long size;
// 哈希表大小掩码,用于计算索引值
// 总是等于size - 1
unsigned long sizemask;
// 哈希表已使用节点数
unsigned long used;
} dictht
复制代码
- There is an array of hash table, is not it a bit familiar, and
Node<K,V>[]
the same purpose, then look atdictEntry
this class.
typedef struct dictEntry {
// 键
void *key
// 值
union {
void *val;
unit64_tu64;
int64_ts64;
} v
// next指针
struct dictEntry *next;
}
复制代码
-
And a little familiar, and key-value pairs!
key
Attribute holds the key to the keys, andv
attributes are stored with values, wherein the key-value pair may be a pointerunit64_t
,int64_t
an integer.next
Hash table is a pointer to another node, the pointer may be the same hash key value table connected to a plurality of a list. -
We can see, so far,
HashMap
the basic lacks distinction, in addition to some more additional attributes (hash table size, number of nodes have been used, mask, etc.).
Distinguished place
- Just read the underlying dictionary is implemented by a hash table structure, then the dictionary is the true face of what is it?
typedef struct dict {
// 类型特定函数
dictType *type;
// 私有数据
void *privdata;
// 哈希表(上文讲的)
dictht ht[2];
// rehash索引
// 当rehash不在进行时,值为-1
int trehashidx;
}
复制代码
- Thus, the Redis data structures introduced over dictionary, the dictionary in the ordinary state in the graph.
Redis dictionary hashing algorithm
- Calculating a hash value is calculated based on a function process to calculate a hash value of the hash value of the dictionary:
hash = dict -> type->hashFunction(key)
- Use sizemask property and the hash value of the hash table, the index value is calculated, depending on the situation
ht[0]
orht[1]
. And in factHashMap
thehash(key) & (table.length - 1)
same, because to say the commentssizemask
are always equalsize - 1
.
index = hash & dict->ht[x].sizemask
- Redis hash value calculation algorithm is used
MurmurHash2
, the author can not afford description, given connection.
Dictionary Redis hash conflict resolution
- Redis hash table also use chain address method , each node has a
next
pointer, a plurality of nodes to form a unidirectional linked list, andHashMap
the difference is not due to the tail pointer table using the first interpolation to add a new node to the list header position.
Redis and HashMap difference
See the hash algorithm, hash conflict resolution mode, there is not much difference, then the difference Where children do? That is again a hash .
Rehash
-
Just talked about, the dictionary data structure, there are two hash tables (
ht[2]
), the secret here. -
Rehash
It aims to allow the load factor of the hash table is maintained within a reasonable range, hash table when key for too much or too little, need to be expanded or contracted . -
Proceed as follows:
- A dictionary of
ht[1]
hash tables allocated space, if performing the expansion operation, theht[1]
size of the first greater than or equal to ; if you are performing contraction operation, theht[1]
size of a greater than or equalht[0].used
to . - Will be saved in
ht[0]
all the key valuesrehash
to theht[1]
above, i.e. re-calculated hash value and the index key value, then the key-value pair is placed intoht[1]
the specified location on the hash table. - When
ht[0]
all keys are included to migrate to theht[1]
later (ht[0]
becomes empty hash table), releaseht[0]
theht[1]
setht[0]
, and inht[1]
the newly created an empty hash table for the nextrehash
preparation.
- A dictionary of
-
Note : Progressive Rehash article did not say, I really did not drawing power, the data structure is more complex, please understand.
to sum up
This article collision resolution from the hash algorithm, hash, a brief analysis of HashMap
, Redis 字典
the hash
, views do not necessarily all right, please God criticism large!
references
- zh.wikipedia.org/wiki/ hash function
- Design and Implementation of Redis
- www.zhihu.com/question/26…