Understanding hash function

First, let's introduce the concept of a hash function. Input field of the hash function can be a very wide range, for example, a string, but its output field is a fixed range. And has the following properties:

  1. Typical hash functions have infinite input range.
  2. When the same input value passed to the hash function, as the return value.
  3. When an incoming different input values ​​to the hash function, the return value may be the same or may be different, it is, of course, since the output field range is fixed, so there will be different input values ​​corresponding to the output element in a field on, which relates to the issue of a hash collision.
  4. The most important properties of a number of different input values ​​are obtained return value is distributed uniformly on the output field.

1 to 3:00 is the basis of the nature of the hash function, is the key point 4 nature merits of a hash function evaluation, the more uniform the different input values ​​obtained distribution occurs all the return values ​​of the input value independent.

Constructor hash function

1) direct addressing method:

Linear function takes a keyword or hash address value: H (key) = key or H (key) = a · key + b
where a and b are constants, such a hash function called function itself.

Note: Due to the resulting address directly addressing the same set of keywords and size of the collection. Therefore, the conflict will not occur for different keywords. But the situation in practice can use this hash function is small.

2) multiplying the rounding method:

First, a keyword key multiplying the constant A (0 <A <1), and extracts a decimal part key.A; m and then multiplying the decimal after rounding.

Note: The biggest advantage of this method is lower than the selected m other than law requires. For example, it is entirely selectable integer power of two. Although this method is applicable to any value of A, but the effect will be better for certain values. Knuth suggest to select 0.61803 .......

3) square reindeer method:

After taking the middle of the square a few keywords into a hash address.

Expanded by squaring the difference, several additional intermediate associated with each bit of the multiplier, thereby generating a hash address more uniform. This is a more common method of constructing a hash function.

将一组关键字(0100,0110,1010,1001,0111) 
平方后得(0010000,0012100,1020100,1002001,0012321) 
若取表长为1000,则可取中间的三位数作为散列地址集:(100,121,201,020,123)。

4) In addition stay:

Key is obtained after taking a remainder p is the number of hash address: H (key) = key MOD p (p ≤ m).

Note: This is the simplest and most commonly used method for constructing a hash function. It not only keywords can be directly modulo (the MOD), may also be taken after the modulo operation in the middle folded square. It is noteworthy that, in addition to the remainder when the law remain, it is important to choose p in use. Generally composite number may be selected from p is a prime number or not containing less than 20 mass elements.

5) a random number method:

Select a random function, the random function takes keyword hash value of its address, i.e., H (key) = random (key), which is a random function random. Typically, this method using a hash function configured more appropriate keywords when unequal length.

Hash conflict resolution

1) Open addressing method:

After that conflict, through some kind of detection technology, in order to probe other units until the probe until the conflict is not, add an element to it.

If the hash collision occurs at position index, then usually about several detection methods:

  • Linear detection method (linear probing re-hash)
    rearwardly sequentially detecting index + 1, index + 2 ... position, to see if the conflict until the conflict is not, the elements added to it.

  • Square detection method
    is not detecting a position index, but the position of the probe I ^ 2, 2 ^ detect such conflict zero position, then the position of the probe 2 ^ 1, and so on, until the conflict resolution.

note:

(1)用开放定址法建立散列表时,建表前须将表中所有单元(更严格地说,是指单元中存储的关键字)
置空。
(2)两种探测方法的优缺点。
     线性探测法虽然在哈希表未满的情况下,总能保证找到不冲突的地址,但是容易发生二次哈希冲
     突的现象。比如在处理若干次次哈希冲突后k,k+1,k+2位置上的都存储了数据,那下一次存储地
     址在k,k+1,k+2,k+3位置的数据都将存在k+3位置上,这就产生了二次冲突。
     这里引入一个新的概念,堆积现象是指用线性探测法处理哈希冲突时,k,k+1,k+2位置已存有数
     据,下一个数据请求地址如果是k,k+1,k+2,k+3的话,那么这四个数据都会要求填入k+3的位置。
     
     平方探测法可以减少堆积现象的发生,但是前提是哈希表的总容量要是素数4n+3才可以。

2) Chain address method (open hashing method)

The basic idea:

Method chain is at the address conflict, linked to a singly linked list at that location and then all the data collision, are inserted into the list. There are several ways to insert data can be inserted from the tail to the head of the list followed by the data, you can also insert data from head to tail in turn, can also be inserted into the middle of the list according to certain rules, in short, to ensure that data in a linked list orderliness. Java's HashMap class is to take treatment options list law.

Example: Given a set of keywords (19,14,23,01,68,20,84,27,55,11,10,79), press the hash function H (key) = key MOD13 link address and treatment conflicts resulting hash table is constructed:
Here Insert Picture Description

3) re-hashing double hashing :()

After the hash collision occurs, use a different hashing algorithm to generate a new address until no conflict occurs. This should be well understood.

Re-hashing method can effectively avoid the accumulation of the phenomenon, but the drawback is not increasing the number of computing time and hash algorithm, but is not guaranteed in the hash table is not full, you can always find the addresses do not conflict.

4) establishment of a common overflow area:

Establish a base table, the size of the base table is equal to the size of the hash table. The establishment of an overflow table, all of the first record hash addresses are present in the base table, all the data conflict, no matter what the address hashing algorithm that are placed in the overflow table.

But there is a drawback is that you must know in advance the possible size of the hash table, and the overflow table data can not be too much, otherwise affect query efficiency overflow table. In fact, to minimize conflicts.

Reference Links: https://blog.csdn.net/m0_37925202/article/details/82015731

Guess you like

Origin blog.csdn.net/weixin_43234372/article/details/93010592