By constructing a well-performing hash function, collisions can be reduced, but it is generally impossible to completely avoid collisions, so resolving collisions is another key issue in hashing. Both creating a hash table and looking up a hash table will encounter conflicts, and the methods for resolving conflicts should be the same in both cases. The following takes creating a hash table as an example to illustrate the method for resolving conflicts. There are four commonly used conflict resolution methods:
open addressing
This method is also called re-hash method. Its basic idea is: when the hash address p=H ( key) of the keyword key collides , another hash address p1 is generated based on p, if p1 still collides , and then based on p, generate another hash address p2, ..., until a non-conflicting hash address pi is found, and the corresponding element is stored in it. This method has a general form of the rehash function:
Hi = (H (key)+d i)% m i = 1 ,2 ,n
Where H ( key) is the hash function, m is the table length, and d i is called the incremental sequence. The value of the incremental sequence is different, and the corresponding re-hashing method is also different. There are three main types:
Linear Probe Rehashing
d i i = 1 ,2 ,3 ,… ,m-1
The characteristic of this method is that when a conflict occurs, the next unit in the table is sequentially viewed until an empty unit is found or the entire table is searched.
Second probing and rehashing
di=12,-12,22,-22,…,k2,-k2 ( k<=m/2 )
The feature of this method is: when a conflict occurs, it is more flexible to perform jump detection on the left and right sides of the table.
Pseudorandom Probe Rehashing
d i = sequence of pseudorandom numbers.
In the specific implementation, a pseudo-random number generator should be established (such as i=(i+p) % m), and a random number should be given as the starting point.
For example, given the hash table length m=11, the hash function is: H( key) = key % 11, then H( 47) =3, H( 26) =4, H( 60) =5, assuming the following A keyword is 69, then H( 69) = 3, which conflicts with 47.
If linear detection is used to re-hash the collision, the next hash address is H1=( 3 + 1) % 11 = 4, and there is still a conflict, and the next hash address is H2 = ( 3 + 2) % 11 = 5 , or conflict, continue to find the next hash address as H3=( 3 + 3) % 11 = 6, no conflict at this time, and fill in 69 into cell 5.
If the collision is handled by the second detection and then hashing, the next hash address is H1=( 3 + 1 2 ) % 11 = 4, and there is still a conflict, and then the next hash address is H2=( 3 - 1 2 ) % 11 = 2, there is no conflict at this time, and 69 is filled in cell 2.
If the collision is handled by pseudo-random detection and then hashing, and the pseudo-random number sequence is: 2, 5, 9, …….., then the next hash address is H1=( 3 + 2) % 11 = 5, still collision , and then find the next hash address as H2=( 3 + 5) % 11 = 8. At this time, there is no conflict, and 69 is filled in the 8th unit.
Rehashing
This approach is to construct multiple different hash functions at the same time:
Hi=RH1(key) i=1,2,…,k
When the hash address H i =RH 1 ( key) collides, calculate H i =RH 2 ( key)... until the conflict no longer occurs. This method is less prone to aggregation, but increases computation time.
chain address method
The basic idea of this method is to form a singly linked list called a synonym chain for all elements whose hash address is i, and store the head pointer of the singly linked list in the i-th unit of the hash table, thus searching, inserting and deleting Mainly in synonym chains. The chain address method is suitable for frequent insertions and deletions.
Create a common overflow area
The basic idea of this method is: divide the hash table into two parts: the basic table and the overflow table, and all elements that conflict with the basic table will be filled in the overflow table.