Hash collision and hash collision attack analysis

1. What is a hash collision?

When data is inserted into the hash table, the h (key) generated by different key values ​​​​are equal, and a conflict occurs at this time.

2. How to resolve hash conflicts?

Several commonly used methods include: open addressing method, zipper method, re-hashing method, and establishing a public overflow area.

1. Open addressing method

The so-called open addressing method is to find the next empty hash address once a conflict occurs. As long as the hash table is large enough, the empty hash address can always be found, and the record is stored in the formula: fi(key) 
= (f(key)+di) MOD m (di=1,2,3,……,m-1)


Detailed explanation: When a conflict occurs, use some detection technology to form a detection sequence in the hash table (according to different rules for generating the sequence, there can be linear detection method, pseudo-random detection method, quadratic detection method, double hashing method, etc.) . Search along this sequence cell by cell until the given keyword is found or an open address is encountered (that is, the address cell is empty).


For example (borrowing an example), our keyword set is {12,67,56,16,25,37,22,29,15,47,48,34}, and the table length is 12. We use the hash function f(key) = key mod l2. 
When calculating the first S numbers {12, 67, 56, 16, 25}, they are all hash addresses without conflicts and are stored directly: 

Write picture description here 
When calculating key = 37, it is found that f(37) = 1, which conflicts with the position of 25. 
So we apply the above formula f(37) = (f(37)+1) mod 12 = 2. So 37 is stored at the location with index 2: 
Write picture description here

2. Zipper method

The zipper method is an effective method to resolve hash conflicts (for example, PHP and Java use this method to resolve hash conflicts). Some hash addresses can be shared by multiple keyword values, so that each hash can be targeted. Create a singly linked list for the desired address.

Principle: Each hash table node has a next pointer. Multiple hash table nodes can use the next pointer to form a one-way linked list. Multiple nodes assigned to the same index can be connected using this one-way linked list.

For example (borrowing an example): The calculated index values ​​of the key-value pair k2, v2 and the key-value pair k1, v1 are both 2. At this time, a conflict occurs, but the nodes where k2 and k1 are located can be connected through the next pointer. This solves the hash conflict problem 
Write picture description here

3. Rehash method

Rehash method is relatively simple to understand. Rehash method is also called double hash method. There are multiple different Hash functions. When a conflict occurs, you can use the second function. If a conflict occurs, continue to use the third one. function, and so on, until there is no conflict. Although the hashing method is not prone to aggregation, it greatly increases the calculation time.

4. Establish public overflow areas

Principle: Divide the hash table into two parts: the basic table and the overflow table. All elements that conflict with the basic table are filled in the overflow table.

3. Attack using hash collisions

I mentioned above that languages ​​such as PHP and Java use the zipper method to resolve hash conflicts.

The attack principle is: constructing malicious data to degenerate the hash table into a linked list. Each time data is inserted, the linked list will be traversed, consuming a large amount of server resources, thereby achieving the purpose of the attack.

For example: PHP arrays are implemented using hash tables. Let’s use an example to see the difference between maliciously constructed data and ordinary data.

    <?php
    $size = pow(2, 16);
    $startTime = microtime(true);
    $array = array();
    for ($key = 0, $maxKey = ($size - 1) * $size; $key <= $maxKey; $key += $size) {
        $array[$key] = 0;
    }
    $endTime = microtime(true);
    echo '插入 ', $size, ' 个恶意的元素需要 ', $endTime - $startTime, ' 秒', "\n";

    $startTime = microtime(true);
    $array = array();
    for ($key = 0, $maxKey = $size - 1; $key <= $maxKey; ++$key) {
        $array[$key] = 0;
    }
    $endTime = microtime(true);
    echo '插入 ', $size, ' 个普通元素需要 ', $endTime - $startTime, ' 秒', "\n";

The above example maliciously constructs data, resulting in a hash conflict every time data is written, causing the entire linked list to be traversed every time it is deposited. The execution results on my machine are as follows:

(This is because my machine configuration is higher, and php7 has optimized the structure and algorithm of the hash table. If you use php5.6, it will take longer)

插入 65536 个恶意的元素需要 7.9304070472717 秒
插入 65536 个普通元素需要 0.0048370361328125 秒

Please refer to Niao Ge's blog address: Hash conflict example of PHP array - The corner of the wind and snow

Guess you like

Origin blog.csdn.net/panjiapengfly/article/details/121145250