What is hash? What is hash collision? How to deal with hash collision?

hash collision

If the hash function values ​​of two input strings are the same, the two strings are said to be in a collision. Since a string of arbitrary length is turned into a string of fixed length, there must be an output string corresponding to an infinite number of input strings, and collisions are inevitable.

An excellent hash function f should satisfy the following three conditions:

(1) For any y, it is computationally infeasible to find x such that f(x)=y.

(2) Given x1∈A, it is computationally impossible to find x2∈B, such that f(x1)=f(x2), which is weak collision-free property.

(3) Finding x1 and x2 such that f(x1)=f(x2) is computationally infeasible, which is strong collision-free property.

This is called a secure and confidential Hash function, and there is no faster method except enumeration. As in Article 3, according to the birthday theorem, to find such x1, x2, theoretically approximately 2^(n/2) number of enumerations are required.

Because the first two hash functions that can be destroyed are too weak and are abandoned, almost all hash function cracks refer to destroying the third property above, that is, finding a collision. Another concept in cryptography is theoretical cracking, which refers to proposing an algorithm so that collisions can be found with a lower number of enumerations than the theoretical value.

Collision handling

There are usually two types of methods for handling collisions: Open Addressing and Chaining. The former stores all nodes in the hash table T[0...m-1]; the latter usually puts all elements hashed into the same slot in a linked list, and the head pointer of this linked list is placed in In the hash table T[0...m-1].

(1) Open addressing method

All elements are in the hash table, and each entry contains either an element of the dynamic collection or NIL. In this approach the hash table may fill up so that no new elements can be inserted. In open addressing, when an element is to be inserted, the entries of the hash table can be continuously checked or probed until there is an empty slot to place the key to be inserted. There are three techniques used for open addressing: linear probing, quadratic probing, and dual probing.

<1>Linear detection

Given an ordinary hash function h': U —> {0, 1,..., m-1}, the hash function used by the linear detection method is: h(k,i) = (h'(k)+ i)mod m,i=0,1,…,m-1

 探测时从i=0开始,首先探查T[h'(k)],然后依次探测T[h'(k)+1],…,直到T[h'(k)+m-1],此后又循环到T[0],T[1],…,直到探测到T[h'(k)-1]为止。探测过程终止于三种情况: 

(1) If the currently detected unit is empty, it means that the search failed (if it is inserted, the key is written into it);
  (2) If the currently detected unit contains the key, the search is successful, but it means failure for insertion;
  ( 3) If no empty unit or key is found when T[h'(k)-1] is detected, both search and insertion will mean failure (the table is full at this time).

Let’s talk about Hash and Hash collision in more detail.

Hash, to put it simply, is a method of transforming an input of any length into an output of a fixed length. The fixed-length output can represent the input in "actual application scenarios". Hash function is usually translated as hash function. Hash is usually used to verify the consistency of information .
  There are various implementations of Hash functions, and the SHA-x series and MDx series are the most widely used in the security field. The Hash function is also divided into a Hash function with a key and a Hash function without a key. The Hash function is usually a Hash function without a key. Due to the
  fixed length output characteristics of Hash, there will inevitably be multiple different inputs. Same output. If the hash function values ​​of two input strings are the same, the two strings are said to be in a collision. Within the theoretical scope, there is an output string corresponding to infinite input strings, so collision is inevitable.
  If a collision is found, it means that we can destroy the consistency of the information without being noticed by the receiver. The process of searching for the Hash collision value of the specified input is called "Hash cracking". What needs to be noted here is that the Hash function must be irreversible, so there is no cracking from the hash value to the original input (violent cracking is not included here. Using a rainbow table is the best way to crack brute force, but there is still no guarantee that the cracked data is raw data). Poorly designed Hash algorithm makes it easy for people to find collision values

Guess you like

Origin blog.csdn.net/YOUYOU0710/article/details/108761052