Hash hash function

How to construct a hash function? I summed up the three basic requirements of a hash function design:

1. The hash value is a hash function calculated a non-negative integer;

2. If key1 = key2, then hash (key1) == hash (key2)

3. If key1 ≠ key2, that hash (key1) ≠ hash (key2)

Let me explain these three points. Among them, the first point to understand it should be no problem. Because the array subscript is zero, so must the hash function to generate hash nonnegative integer. The second point is also well understood. The same key, a hash value after the hash function should also be obtained the same.

The third point to understand there may be a problem, I talk about focus, to find a different key corresponding hash values are not the same hash function is almost impossible (?) Even the industry's leading MD5 / SHA / CRC such as hashing algorithm, we can not completely avoid this hash collision. Moreover, because of the limited storage arrays, hashes will also increase conflict, we need to be resolved by other means.

For red do not understand ...

Check this blog

https://www.cnblogs.com/yft-javaNotes/p/10779042.html

Hash function (Hash Function), also known as a hash function. It is to map a large file into a small string of characters . And fingerprints, is to sign a short information to ensure the uniqueness of a file, each byte of this marker is associated with the file, and it is difficult to find reverse the law.

 

for example: 

 

        Server saved 10 text files, you now want to determine a new text file, and that file has 10 no one is the same. You can not go over to each byte of each text inside, most likely, two text files are 5000 bytes, but only the last one is different, but this, in front of your 4999 comparison It is meaningless. One solution that is stored in a text file that 10 time, will map each file into a hash string. Server only needs to store 10 hash string, at the time of judgment, just the same hash value to determine whether the hash value of the new text file, and that 10 files, it can solve this problem.

 

Simply put, hash is a message of arbitrary length is compressed to a fixed length message digest function.

Because the file is unlimited, while the number of bits of string mapping can be expressed is limited. Thus there may be different corresponding to the same key Hash value. It is the existence of a potential collision.

Hash algorithm is not reversible, i.e., not through the inverse Release Hash value key value.

The nature of the hash function

For classic hash function, it has the following 5-point nature:

  1. Enter the domain infinity
  2. Output fields have exhausted
  3. Like input, output is certainly the same
  4. When the input is not the same, the same output is also possible (hash collision)
  5. Different input evenly distributed on the output field (discrete hash function)
0-98, for example, this input field 99 is digital, and the output field we use the hash function is 0, when the return value will be 0-98 figures 99 through which the hash function, obtained, 0,1,2 number will be close to 33, it does not appear a particularly large number of return values, and return a value less special. 


How to resolve hash collision

even the best hash function hash collision can not be avoided. That exactly how to solve the conflict hash of it? We used a hash conflict resolution There are two types, open addressing method (open addressing) and the list method (Chaining)
1. Open addressing method:
The core idea of open addressing method is that, if there is a hash conflict, we re detecting an idle position, it is inserted. How to re-detect new locations that do? I talk about a relatively simple detection method, linear probe (Linear Probing)
when we insert data into the hash table, if a data after a hash function hash, storage location has been occupied, we have from the current position start, look back to see if there is an idle position, until you find.
I say may be more abstract, I give you a specific example to illustrate this point. There is yellow color represents the idle position, the orange represents the data has been saved.

As can be seen from FIG hash table of size 10, before inserting the element x hash table has six elements inserted into the hash table. x After a hash algorithm, is hashed to the position index is 7, respectively, but the location has data, so it is a conflict. So we back sequentially one by one to find, to see if there is no idle position, traverse to the tail did not find idle position. So we head from the table and then
start looking until you find the idle position 2, then insert it into this position.
The search process in the hash table is somewhat similar to the insertion procedure.

Detection method + OpenAddressed linear


list Method

Guess you like

Origin www.cnblogs.com/qifei-liu/p/12100833.html