From HashMap, Redis dictionary see [Hash]. . .

Foreword

Today, fishing in troubled waters looked at HashMapthe source code, think of the great God experienced interviewer asks students to interview Redis 字典and HashMaphash process any different. . . Honestly, also design and implementation of Redis (really recommend) read, but not exactly describe it, so write this article.

Note : Since this article emphasis on the hash process, part of the source code to see the great God who lucid narrative it ~

Hash function (hash algorithm)

  • Hash function known as hashing algorithm, the hash function is a method to create a small digital "fingerprint" from any kind of data, and this "fingerprint" is the hash value .

  • Applications hash function is very broad, such as data protection, to ensure real transfer of information, such as a hash table. This paper discusses the application is on the hash table.

  • Of course, we hope that the hash function ensures that each key corresponds to a "fingerprint" that is a hash value , the so-called universal hashing , but due to performance, application scenarios and other considerations, can not be accepted too many hash collisions .

  • Hash collision **: ** the input and output of the hash function is not unique corresponding relationship, such as a hash value of the hash function input A, B is obtained C.

  • Common hash functions:

Direct-addressable digital analysis middle-square method Method I stay folded random number addition method

  • Surely we all know HashMap, Redis dictionary class scene, will be selected based on the I stay optimized, as the hash function. Will not repeat here the concept of each method, to please take a look here .

Hash collision (hash collision) solution

  • hash?->散列算法的选择->散列冲突怎么解决Presumably this is what most colleagues mindset of. So, we look at the hash algorithm to solve the major conflicts What?

Open-addressable

  • Once the conflict, went looking for the next empty hash address, company follows the most simple algorithms. f(key) = (f(key) + d) mod  m(d= 1,2,...,m-1)For chestnut, set mto 12, 26, 37 are sequentially inserted, (26 + 1) = 12% (37 + 1) 12%, a hash collision occurs, thus again (37 + 2) = 12 4%, hashing value is not the same, the conflict will be solved.

Re-hashing

  • While preparing a plurality of hash functions, when the first hash function conflicts can have a spare hash function calculation.

Chain address method

  • In addition to use to give I stay hash value, the hash collisions if the linked list node is inserted one by one collision, a structure is formed as shown below. The following are examples, f(x) = key mod 12the scene can be seen 48% 12% 12 = 12 = 0, forming a linked list.

Public Law overflow area

  • The use of additional public storage element values ​​conflict of space to store the hash.

HashMap

HashMap hash algorithm

Episode LOAD_FACTOR(load factor)

  • We all know that HashMapthe default LOAD_FACTORis 0.75, what it does is it? Followed by their source to search for traces of ~

  • newA HashMapsource code annotation tells us capacityof 16, load_factor0.75.

/**
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }
复制代码
  • With the putelements, we must carry out the expansion, see the resize()function, refers to the part of the iconic posted.
else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
复制代码
  • We can see, the threshold value by capacitymultiplying the load_factorresults of that 16 * 0.75 = 12. HashMapThe expansion element more than a threshold will be appreciated that as the threshold value with respect to the small capacity, reduces the probability of a hash collision, because the probability of a hash collision into 16 elements, in the same hash function, the large chance of a hash collision probability is smaller than 12 elements .

Hash function and analytic formula

  • HashMap hash algorithm similar to other I stay, but not by MODcalculation, but with a bit operation, assuming keyan input value, the hash function is a f(key)specific formula is as follows:
f(key) = hash(key) & (table.length - 1) 
hash(key) = (h = key.hashCode()) ^ (h >>> 16)
复制代码
  • hash(key) & (table.length - 1)Is table.lengthoptimized version to take over, in fact, almost the role, it is based on I stay removed. Due to HashMapthe characteristics of each expansion table.lengthwill be the 2^n, so more significant bit arithmetic efficiency.
  • >>>Unsigned right shift operator, we know that hashCode()the range is very wide, the possibility of conflict itself is very small, but with the table.length - 1the probability becomes large, because table.lengtha smaller value. That's why the use of >>> 16reason, hashCode()of both high and low for f(key)a certain clout, more even distribution, the probability of a hash collision on the smaller.

HashMap hash conflict resolution

  • HashMapThe obvious solution is to hash collision chain address law , in fact, can be seen from the structure.
  • From HashMapthe resize()process can also be seen, see the section below the source. . . Comment on.
if (oldTab != null) {
			// 遍历旧数组上的节点
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    // 当前链表只有一个节点(没有散列冲突的情况)
                    if (e.next == null)
	                    // 通过散列算法计算存放位置并放入
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
	                    // 红黑树去了。。。
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
	                    // 低位链表、高位链表
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
	                        // 遍历发生散列冲突的链表
                            next = e.next;
                            // hash值小于旧数组容量 放入低位链表
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            // hash值大于等于旧数组大小 放入高位链表
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        // 低位链表放在原来index下
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        // 高位链表放在原来index + 旧数组大小
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
复制代码

Redis dictionary

Introduction to Data Structure

And HashMapabout the same place

First, a brief dictionary of Redis data structures will be briefly described, are my college days cast a shadow of Clanguage.

typedef struct dictht {
	// 哈希表数组
	dictEntry **table;

	// 哈希表大小
	unsigned long size;

	// 哈希表大小掩码,用于计算索引值
	// 总是等于size - 1
	unsigned long sizemask;
	
	// 哈希表已使用节点数
	unsigned long used;
	
} dictht
复制代码
  • There is an array of hash table, is not it a bit familiar, and Node<K,V>[]the same purpose, then look at dictEntrythis class.
typedef struct dictEntry {
	// 键
	void *key

	// 值
	union {
		void *val;
		unit64_tu64;
		int64_ts64;
	} v

	// next指针
	struct dictEntry *next;
}
复制代码
  • And a little familiar, and key-value pairs! keyAttribute holds the key to the keys, and vattributes are stored with values, wherein the key-value pair may be a pointer unit64_t, int64_tan integer. nextHash table is a pointer to another node, the pointer may be the same hash key value table connected to a plurality of a list.

  • We can see, so far, HashMapthe basic lacks distinction, in addition to some more additional attributes (hash table size, number of nodes have been used, mask, etc.).

Distinguished place

  • Just read the underlying dictionary is implemented by a hash table structure, then the dictionary is the true face of what is it?
typedef struct dict {
	// 类型特定函数
	dictType *type;
	
	// 私有数据
	void *privdata;

	// 哈希表(上文讲的)
	dictht ht[2];

	// rehash索引
	// 当rehash不在进行时,值为-1
	int trehashidx;
}
复制代码
  • Thus, the Redis data structures introduced over dictionary, the dictionary in the ordinary state in the graph.

Redis dictionary hashing algorithm

  • Calculating a hash value is calculated based on a function process to calculate a hash value of the hash value of the dictionary:hash = dict -> type->hashFunction(key)
  • Use sizemask property and the hash value of the hash table, the index value is calculated, depending on the situation ht[0]or ht[1]. And in fact HashMapthe hash(key) & (table.length - 1)same, because to say the comments sizemaskare always equal size - 1.

index = hash & dict->ht[x].sizemask

  • Redis hash value calculation algorithm is used MurmurHash2, the author can not afford description, given connection.

Dictionary Redis hash conflict resolution

  • Redis hash table also use chain address method , each node has a nextpointer, a plurality of nodes to form a unidirectional linked list, and HashMapthe difference is not due to the tail pointer table using the first interpolation to add a new node to the list header position.

Redis and HashMap difference

See the hash algorithm, hash conflict resolution mode, there is not much difference, then the difference Where children do? That is again a hash .

Rehash

  • Just talked about, the dictionary data structure, there are two hash tables ( ht[2]), the secret here.

  • RehashIt aims to allow the load factor of the hash table is maintained within a reasonable range, hash table when key for too much or too little, need to be expanded or contracted .

  • Proceed as follows:

    1. A dictionary of ht[1]hash tables allocated space, if performing the expansion operation, the ht[1]size of the first greater than or equal ht[0].used *2to 2^n; if you are performing contraction operation, the ht[1]size of a greater than or equal ht[0].usedto 2^n.
    2. Will be saved in ht[0]all the key values rehashto the ht[1]above, i.e. re-calculated hash value and the index key value, then the key-value pair is placed into ht[1]the specified location on the hash table.
    3. When ht[0]all keys are included to migrate to the ht[1]later ( ht[0]becomes empty hash table), release ht[0]the ht[1]set ht[0], and in ht[1]the newly created an empty hash table for the next rehashpreparation.
  • Note : Progressive Rehash article did not say, I really did not drawing power, the data structure is more complex, please understand.

to sum up

This article collision resolution from the hash algorithm, hash, a brief analysis of HashMap, Redis 字典the hash, views do not necessarily all right, please God criticism large!

references

Guess you like

Origin juejin.im/post/5d67c96c6fb9a06b160f4017