An introduction to the construction methods of common hash functions

1. Except for leaving and taking remainder

The method of dividing and leaving the remainder is to divide the keyword by a positive integer p that is not larger than the length of the hash table, and use the resulting remainder as the address.

Specifically, the steps for the remainder method are as follows:

1. Choose a positive integer p that is no larger than the length of the hash table as the divisor.
2. Divide the keyword by the divisor p, and take the remainder as the index address of the hash table.

For example, suppose we have a set of keywords {apple, banana, cat, dog} and we want to map these keywords into a hash table of size 10.

Using the division and remainder method, suppose we choose the divisor p to be 7.

At this time, the mapping relationship is as follows:

apple:索引位置为1(因为ascii码值之和为539,除以7的余数为1)
banana:索引位置为6(ascii码值之和为613,除以7的余数为6)
cat:索引位置为4(ascii码值之和为312,除以7的余数为4)
dog:索引位置为6(ascii码值之和为314,除以7的余数为6

Therefore, we can map the keys to index positions 1, 6, 4, and 6 of the hash table respectively.

The divide-and-remain method is a simple and commonly used hash function construction method that can calculate a hash value in constant time. However, it should be noted that choosing an appropriate divisor has a great impact on the performance of the hash table. Different divisors may lead to different hash conflicts.

2. Direct addressing method

The method of mapping directly to the corresponding array position based on the key is called direct addressing.

In direct addressing, the range of keywords usually needs to match the size of the array so that each keyword can be mapped to a location in the array.

For example, suppose we have an array of size 10000 and we want to map the key 1232 to the position index 1232 in the array. You can directly use the keyword as the index of the array and place it at the corresponding position in the array.

In fact, the direct addressing method can be regarded as a special hash function. The calculation process of the hash function is very simple, using the keyword directly as the index value. Therefore, the time complexity of the direct addressing method is O(1), that is, it has constant-level search efficiency.

It should be noted that the direct addressing method requires the value range of the keyword to be small and continuous, otherwise it will cause a large waste of space. At the same time, if multiple keywords are mapped to the same array location, conflicts will occur, and conflict resolution methods are required, such as using linked lists or open address methods.

3. Digital analysis method

Numeric parsing is a method of constructing a hash function, which usually selects certain numbers in the keyword as the location of the map. Certain numbers in the key can provide some useful information, allowing the hash function to map the key evenly into the hash table.

In the numeric analysis method, we can choose certain digits of the keyword as the position of the mapping. Common choices include the highest digit, lowest digit, middle digit of the keyword, or a specific combination of digits, such as tens and hundreds.

By choosing different combinations of bits, we can construct different hash functions to meet specific needs and requirements. Numeric analysis is a simple yet effective way to construct a hash function that can produce good distribution performance in certain situations.

Suppose we have a set of keywords {1234567, 987654321, 555, 8888} and we want to map these keywords into a hash table.

Using numeric analysis, we can follow the following steps to construct a hash function:

1. Represent keywords in numerical form:

1234567
98765432
555
8888

2. Take the tens and hundreds digits of the keyword as the mapping position:

第一关键字:56
第二关键字:43
第三关键字:55
第四关键字:88

3. Use the selected number combination as the hash value:


哈希值:56435588

4. Modulo the hash value:


56 取模得到索引位置 6
43 取模得到索引位置 3
55 取模得到索引位置 5
88 取模得到索引位置 8

Finally, we can map the keys to index positions 6, 3, 5, and 8 of the hash table respectively.

4. Square-Medium Method

The Mid-Square Method is a hash function construction method that is used to map a given keyword (Key) to the index position of the hash table.

The basic idea of ​​this method is to square the keyword first, and then intercept a section from the middle as the index position of the hash table.

Specific steps are as follows:

1. Square the keywords.
2. Convert the squared result into a string.
3. If the number of digits in the result string is an odd number, a section will be intercepted from the middle position as the index position; if it is an even number, a section will be intercepted from the middle two digits as the index position.
4. Convert the intercepted string into an integer and use it as the index position of the hash table.

The advantage of the square-centering method is that it is simple and fast, and is suitable for situations where keywords are evenly distributed. However, this method also has some problems. First, if the number of digits in the squared result is long, the intercepted string may not be random enough, causing the hash value to be unevenly distributed; secondly, for a specific set of keywords, a hash conflict may occur, that is, different keywords mapped to the same index position.

Therefore, in practical applications, it is necessary to choose an appropriate hash function construction method according to the specific situation to avoid conflicts and improve the performance of the hash table.

Suppose we have a set of keywords {23, 45, 67, 89, 12} and we want to map these keywords into a hash table.

Using the square-centering method, we can construct the hash function as follows:

1. Square operation:

23 平方后得到 529
45 平方后得到 2025
67 平方后得到 4489
89 平方后得到 7921
12 平方后得到 144

2. Convert to string:

529 转换为 “5292025 转换为 “20254489 转换为 “44897921 转换为 “7921144 转换为 “144

3. Intercept the string:

529 的位数为 3,从中间截取一段得到索引位置 “22025 的位数为 4,从中间截取两段得到索引位置 “024489 的位数为 4,从中间截取两段得到索引位置 “487921 的位数为 4,从中间截取两段得到索引位置 “92144 的位数为 3,从中间截取一段得到索引位置 “4

4. Convert to integer:

2” 转换为索引位置 202” 转换为索引位置 248” 转换为索引位置 4892” 转换为索引位置 924” 转换为索引位置 4

Finally, we can map the keys to index positions 2, 2, 48, 92, and 4 of the hash table respectively.

5. Folding method

The Folding Method is a construction method of a hash function, which is used to map a given keyword (Key) to the index position of the hash table.

The basic idea of ​​this method is to split the keyword into multiple parts, then fold and sum the parts, and finally get a hash value as the index position.

Specific steps are as follows:

1. Split the keyword into fixed-length parts (can be of equal or unequal length).
2. Perform operations such as summing and folding on each split part to obtain a hash value.
3. Modulo the hash value to obtain the final index position.

The advantage of the folding method is that it can deal with inconsistent keyword lengths and can increase the randomness of the hash value. However, this method may also have some problems, such as the partial folding method may cause hash collisions, or additional handling of edge cases may be required when splitting keywords.

for example:

Suppose we have a set of keywords {1234567, 987654321, 555, 8888} and we want to map these keywords into a hash table.

Using the folding method, we can follow the following steps to construct the hash function:

1. Split keywords:


1234567 拆分为 1234567
987654321 拆分为 987654321
555 拆分为 555
8888 拆分为 8888

2. Folding sum:

12+34+567=613
98+76+54+321=549
5+55=60
8+888=896

3. Modulo the hash value:

613 取模得到索引位置 5
549 取模得到索引位置 9
60 取模得到索引位置 0
896 取模得到索引位置 6

Finally, we can map the keys to index positions 5, 9, 0, and 6 of the hash table respectively.

Guess you like

Origin blog.csdn.net/qq_39939541/article/details/132330607