hash collision

1. What is a hash collision?

-----Also called hash conflict, it refers to the situation that the hashcode of two objects is the same. For example, when two different strings are input, and the hash value calculated by the same hash function is the same, then a hash conflict occurs.

 

2. How to solve hash collision?

1. Open address method:

Open law enforcement has a formula: Hi=(H(key)+di) MOD mi=1,2,…,k(k<=m-1)

Among them, m is the length of the hash table. di is the incremental sequence when a conflict occurs. If the value of di may be 1,2,3,...m-1, it is called linear detection and rehashing.

If di takes 1, after each conflict, move backward by 1 position. If di takes value, it may be 1,-1,2,-2,4,-4,9,-9,16,-16,… k*k,-k*k(k<=m/2), called secondary detection and re-hashing.

If the value of di may be a pseudo-random number sequence. It is called pseudorandom probe rehashing.

2. Rehashing:

When there is a collision, use the second, third, hash function to calculate the address until there is no collision. Disadvantage: increased computation time.

3. Chain address method (zipper method):

Store all records whose keywords are synonyms in the same linear linked list.

Advantages and disadvantages of the zipper method:

advantage:

①The zipper method is simple to deal with conflicts, and there is no accumulation phenomenon, that is, non-synonymous words will never conflict, so the average search length is shorter;

② Since the node space on each linked list in the zipper method is dynamically applied, it is more suitable for the situation where the length of the table cannot be determined before the table is built;

③ The open addressing method requires a small filling factor α in order to reduce conflicts, so a lot of space will be wasted when the node scale is large. In the zipper method, α≥1 is desirable, and when the node is large, the pointer field added in the zipper method can be ignored, so space is saved;

④In the hash table constructed by the zipper method, the operation of deleting nodes is easy to implement. Simply delete the corresponding node on the linked list. For the hash table constructed by the open address method, deleting a node cannot simply set the space of the deleted node to be empty, otherwise the search path of the synonym node filled in the hash table after it will be truncated. This is because in various open address methods, empty address units (ie, open addresses) are conditions for search failure. Therefore, when the delete operation is performed on the hash table that uses the open address method to deal with the conflict, the delete mark can only be made on the deleted node, but the node cannot be deleted.

shortcoming:

      Pointers require additional space, so when the node size is small, the open addressing method saves space, and if the saved pointer space is used to expand the size of the hash table, the filling factor can be reduced, which reduces the open addressing method. collisions, thereby increasing the average lookup speed.

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326520966&siteId=291194637