Solution to Hash Collision

      Hash collision means that the premise of object Hash is to implement the equals() and hashCode() methods, then the role of HashCode() is to ensure that the object returns a unique hash value, but when the two objects calculate the same value, this happens. Collision conflict. And hash collision is unavoidable, the following describes the method to solve the hash collision:

1. Chain address method 

The chain address method solves the conflict: if the hash table space is 0 ~ m - 1, set a one-dimensional array ST[m] composed of m pointer components, and all data elements whose hash address is i are inserted into the head. The pointer is in the linked list of ST[ i ]. This method is somewhat similar to the basic idea of ​​the adjacency list, and this method is suitable for situations with serious conflicts. 

The advantages and disadvantages of the chain address method:

advantage:

①The zipper method is simple to deal with conflicts, and there is no accumulation phenomenon, that is, non-synonymous words will never conflict, so the average search length is shorter;
②Because the node space on each linked list in the zipper method is dynamically applied, it is more suitable for When the table length cannot be determined before the table is built;
③ The open addressing method requires a small filling factor α to reduce conflicts, so a lot of space will be wasted when the node scale is large. In the zipper method, α≥1 is preferable, and when the node is large, the pointer field added in the zipper method can be ignored, thus saving space; ④In
the hash table constructed by the zipper method, the operation of deleting a node is easy to implement. Simply delete the corresponding node on the linked list. For the hash table constructed by the open address method, deleting a node cannot simply set the space of the deleted node to be empty, otherwise the search path of the synonym node filled in the hash table after it will be truncated. This is because in various open address methods, empty address units (ie, open addresses) are conditions for search failure. Therefore, when the delete operation is performed on the hash table that uses the open address method to deal with the conflict, the delete mark can only be made on the deleted node, but the node cannot be deleted.

shortcoming:

Pointers require additional space, so when the node size is small, the open addressing method saves space, and if the saved pointer space is used to expand the size of the hash table, the filling factor can be reduced, which reduces the open addressing method. collisions, thereby increasing the average lookup speed.

1. Open address method

Open Enforcement Formula:

Hi=(H(key)+di) MOD m i=1,2,…,k(k<=m-1)

Where: H ( key ) is the direct hash address of the keyword key, m is the length of the hash table, and di is the address increment for each re-probing. 
When using this method, first calculate the direct hash address H ( key ) of the element, if the storage unit has been occupied by other elements, continue to check the storage unit with the address H ( key ) + d 2 , and repeat until found When a storage unit is empty, the data element whose key is key is stored in this unit. 

The increment d can be taken in different ways, and is called differently according to the way it is taken: 
(1) di = 1 , 2 , 3 , ... Linear detection and re-hashing; 
(2) di = 1^2 , - 1^2 , 2^2 , - 2^2 , k^2, -k^2... Secondary detection and re-hashing; 
(3) di = pseudo-random sequence pseudo-random re-hashing;

Example 1 assumes that the hash function H ( key ) = key mod 7 , the address space of the hash table is 0 to 6 , and the key sequence ( 32 , 13 , 49 , 55 , 22 , 38 , 21 ) is detected by linear detection. The methods of hashing and secondary detection and re-hashing construct hash tables respectively. 
Solution:
(1) Linear Probe Rehashing: 
32% 7 = 4; 13% 7 = 6; 49% 7 = 0; 
55% 7 = 6 Collision occurs, next memory address (6+1)% 7 = 0 , there is still a conflict, and then the next storage address: (6 + 2)% 7 = 1 No conflict occurs and can be stored. 
22% 7 = 1 conflict, the next storage address is: (1 + 1)% 7 = 2 no conflict; 
38% 7 = 3; 
21% 7 = 0 conflict, continue to probe until space 5 according to the above method, No conflict occurs, the resulting hash table corresponds to the storage location: 
subscript: 0 1 2 3 4 5 6 
49 55 22 38 32 21 13 
( 2 ) Secondary detection and re-hash: 
subscript: 0 1 2 3 4 5 6 
49 22 21 38 32 55 13 
   Note: Be careful when deleting an element in the hash table generated by using the open address method to handle collisions. You cannot delete it directly, because it will truncate the other elements with the same hash address. The lookup address, therefore, is usually done by setting a special flag to indicate that the element has been removed.

Third, re-hashing

     In order to eliminate the original aggregation and secondary aggregation, another method can be used: re-hashing. The reason for the secondary aggregation is that the step size of the detection sequence generated by the secondary detection algorithm is always fixed: 1, 4, 9, 16... 
A method that is now needed is to generate a sequence of probes that depends on the key, rather than the same for every key. Then, different keywords can use different probe sequences even if they map to the same array index. 
The method is to hash the keyword again with a different hash function, and use this result as the step size. For the specified keyword, the step size is unchanged throughout the detection, but different keywords use different step size. When there is a collision, use the second, third, hash function to calculate the address until there is no collision. Disadvantage: increased computation time.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324491848&siteId=291194637