Data structure and algorithm: basic principle of hash table and Python implementation

Data structure and algorithm: basic principle of hash table and Python implementation

Fundamentals of Hash Tables

Implementation principle of hash table (20 minutes)

1. Hash table and hash function

Hash tables are similar to data structures such as binary trees and linked lists, and are also a data structure for storing data. The specific application scenario of the hash table is: quickly determine whether a data exists in the data structure, and find its subscript (for example: determine whether 22 is in 1, 65, 34, 97, 12, 22, 48, 89).

The hash table is implemented by the hash function, which is actually a process of taking the modulo (finding the remainder). Let’s explain the hash table directly with an example: four data of 15, 23, 4, and 19 (called the keyword Key) need to be stored, and the size of the array is set to 7, so the range of subscripts is 0~6. The keywords are respectively modulo 17, and the keywords are placed at the corresponding subscripts, as shown in the figure
.
insert image description here

2. Hash table conflict resolution method

If the results of the modulus are the same and cause conflicts, there are generally two solutions: linked list solution and open address solution.

2.1 Linked list solution

As the name implies, the linked list is introduced into the keywords corresponding to the subscript, and the next pointer is introduced to store the conflicting keywords with the linked list data structure.

2.2 Open address resolution

Open address resolution, that is to say, the subscripts of other empty keyword positions in the hash table are used to store conflicting keywords. There are generally three ways to solve the open address:

2.2.1 Linear detection method
If a conflict is encountered, the new subscript = original subscript + iii i i i displayiii repetitions), if the new subscript also has keywords, i = i + 1, just continue to look for it.
In fact, it is to look for subscripts with vacancies one by one.

2.2.2 Square detection method
If there is a conflict, the new subscript = original subscript + i 2 i^2i2 i i i displayiii repetitions)
square detection method can make the distribution of keywords not too dense.

2.2.3 Double hash method
If a conflict is encountered, the new subscript = original subscript + i ∗ hash 2 i*hash2ih a s h 2 (keyword)
among them,hash 2 hash2h a s h 2 (keyword) = R - (keyword modulo R), R needs to be smaller than the size of the original array. For example: 18, 35 need to be put into the hash table whose array size is 17. First put in 18, and the modulo of 18 to 17 is 1, so put it in the position where the subscript is 1. In the same way, the modulus of 35 to 17 is 1, and it should also be placed at the position where the subscript is 1, so a conflict occurs. Take R as 7, and the result of 35 modulo 7 is 0, sohash 2 hash2h a s h 2 = 7 - 0 = 7, so new subscript = original subscript +1 ∗ 7 1*717 .
PS does not directlyhash 2 hash2h a s h 2 (keyword) = (keyword modulo R) is to preventhash 2 hash2A special case of h a s h 2 (keyword)=0.

Python implementation of hash table

To be added

Guess you like

Origin blog.csdn.net/weixin_41670608/article/details/115356196