(Good!) Popular understanding and implementation of hash table algorithm

Sequential look-up table method

Suppose there are now 1,000 individual files that need to be stored in filing cabinets. The requirement is to be able to quickly query whether someone's file has been archived, and if so, the file can be quickly called up. If it were you, what would you do? The most common method is to put each person's files in the cabinet in turn, and then paste the person's name on the outside of the cabinet. When you need to query a person's file, you can determine whether it has been archived according to the person's name. But in the worst case of 1000 people, we need to compare a person's name 1000 times! And the more people there are, the more times the maximum number of queries will be. Professionally speaking, the time complexity of this method is O(n), which means that if the number of people increases by n times, the maximum number of queries will also increase by n times! This method is fine when the number of people is small, and the more the number of people, the more difficult it will be to query! So is there any better solution? The answer is the hash table algorithm, the hash table algorithm.

hash table algorithm

Assuming that the number of strokes of each person's name is not repeated, then we convert the number of strokes of the name of the person to be archived to less than 1000 through a function, and then put the person's information in the cabinet designated by the converted number. This The function is called a hash function, and the 1000 cabinets stored in this way are called hash tables (hash tables). That is, the serial number of the cabinet). When we want to query whether a person has been archived, we use the hash function to convert the number of strokes of his name into a hash value. If the hash value is within 1000, then congratulations that this person has been archived, and you can get the hash value. Go to the designated cabinet to call up his file, otherwise this person is a black household and has no file! This is the hash table algorithm, isn't it very convenient, as long as the hash value is calculated once, the result can be queried. Professionally speaking, the time complexity of this algorithm is O(1), that is, no matter how many people there are Archive, query results can be obtained through one calculation!

Of course, the above is only an ideal situation. The number of strokes of a person's name cannot be repeated, and the converted hash value will not be unique. So what to do? If the hash value calculated by two people is the same, should they both be put in a cabinet? What if 1000 people came up with the same hash value? Here are a few ways to resolve this conflict. 

 

open address law

The method of this method is that if the cabinet corresponding to the calculated hash value has already placed other people's files, then sorry, you have to transform the hash value again through the hash algorithm until you find an empty cabinet until! The same is true when querying. First, go to the cabinet corresponding to the hash value calculated for the first time to see if it is the file you are looking for. If not, continue to transform the hash value through the hash function until you find the file you want. 's file, if you can't find it after searching several times and the cabinet corresponding to the hash value is empty, then I'm sorry, there is no such person!

Zipper method ( chain address method )

The method of this method is that if the cabinet corresponding to the calculated hash value has already put someone else's file in it, it doesn't matter. If you are too lazy to find another cabinet, just put it with his file! Of course they are stored in order. In this way, the next time you come to look for it, there may be many files in a cabinet corresponding to a hash value. In the worst case, the files of 1,000 people may all be in one cabinet! Then the time complexity is O(n) again, which is no different from the ordinary method. When the algorithm is implemented, each array element stores not the content but the header of the linked list. If the hash value is unique, the size of the linked list is 1, otherwise the size of the linked list is the number of repeated hash values.

public overflow area

This method is similar to the zipper method. If the cabinet corresponding to the calculated hash value has already put someone else's file, then put the person in the cabinet corresponding to the hash value in another file room, like this The values ​​are the same, but the archives are different. When searching, according to the obtained hash value, go to different archives to search separately, until the file is found or no file is found. So those other archives are called public overflow areas.

 

It can be seen from the above description that the complexity of the so-called hash table algorithm is not necessarily the ideal O(1), but even so, it is still much faster than the ordinary sequential table lookup method, because it is impossible for all hash tables The hyphens are all the same. If that is the case, it only means that your hash function is not good enough. All you have to do is to change the hash function! A good hash function should make the content to be saved as evenly distributed on the hash table as possible. Commonly used hash functions are: direct addressing method, remainder method, digital analysis method, square method, folding method, random number method, etc. The following are the most commonly used remainder methods.

Find the remainder hash function

这种方法就是用人名笔画数除以一个常数,最后的余数就是哈希值。这就是最简单也是最常用的哈希函数。当然这个常数的取法也是就讲究的,一句话,最好是一个跟2或者10的乘幂差值比较大的素数!这种方法的缺陷就是比较耗时,因为除法取余在CPU执行的时候比其他算数运算用的时钟周期更长!

 

 

REFS:    http://blog.csdn.net/u013752202/article/details/51104156

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326313578&siteId=291194637