Another form of hash structure - Opening address method

In this paper we explore the topic of a basic data structures: hash structure opening address method (Open Addressing)

No Java HashMap no people do not know who knows all Java, which uses open-chain Treatment hash collision, a collision with elements of the chain to hang up on the first string-dimensional array. But not all language dictionaries are using open-chain method to get, such as Python, it uses another form - open address method. Compared HashMap is two-dimensional structure, it is just a one-dimensional array with only one.

Opening address is different from the method and the open chain law is how to deal with hash conflict. When a new element to the array of hash position has been occupied by other elements of how to do?

Opening address method will be calculated based on the current location of the next position, this element will move the conflict to come. If this next position is also occupied, then re-calculate the next position, until you find an empty position. It is conceivable that there will be a virtual chain will string together these related position. This virtual chain open chain law is like inside a second dimension list. There are only linked list pointer field display, but no virtual chain, the chain it is completely calculated by a mathematical function.

root = hash(key) % m   // 第一个位置,m 为数组的长度
index_i = (root + p(key, i)) % m  // 链条中的第 i 个位置

index_1 = (root + p(key, 1)) % m
index_2 = (root + p(key, 2)) % m 
...
复制代码

The mathematical function of the above code is p - probe sequence (probe sequence). Find an empty position detection process is the step by step process. Different key generates a different probe sequences.

When looking for, if you save the first position on the key is not the target key, it would continue to look for detection along the path until you find an empty position or encountered so far.

Here you may be concerned that they may not be an infinite loop detection process, to explore the probe to go back to square one, or back to the middle of the path. It is quite possible, so here probe function can not be chosen at random, it must ensure that the probe sequence will not cycle through the m 1-probing sequence to be generated probe exactly full array of 1..m-1.

There are many such probe function, the most common one is a linear function of the probe. The probe sequence and independent of the input key. The final detection path associated only with the initial position.

// m = 2^n,c 必须是一个奇数
p(key, i) = c * i
index_i = root + c * i
复制代码

Here I'm not going to prove why this function carefully to meet the requirements, we can write a simple code to verify.

public class HashTest {
    public static void main(String[] args) {
        int m = 1 << 16;
        int c = 111111;
        Set<Integer> nums = new HashSet<>();
        for (int i = 1; i < m; i++) {
            int p = c * i % m;
            if(nums.contains(p)) {
                System.out.println("duplicated");
                return;
            }
            nums.add(p);
        }
        System.out.println("no duplicate");
    }
}

------------
no duplicate
复制代码

Well, dead circulatory problems solved. Here there is a problem, how do remove this? Open-chain method to delete is very simple, picked directly from the linked list is, but the opening address method is not so easy to handle, you can not arbitrarily delete the detection path of an element, this will lead to the detection path is broken.

In order to prevent detection path interruption, deletion, there are two implementations

In the deleted location set delete a special mark, when looking back you can skip continue looking along the path detection. Note that this deletion position in the subsequent insertion of new elements will be recycled. When insert elements, traversing the probe paths, met first position marked for deletion, then can not be inserted immediately. Because of this element may be present in the latter part of the probe path. If you find that in the end it is necessary to traverse the path does not exist in this element, this time to look back and inserted into the position of the first discovery marked for deletion. If you remove too many positions will affect find and insert performance. The latter part of the probe path delete all the elements, and then reinsert it. If the path is longer, it may affect insert performance. Here it seems to be over, in fact there is one thing we did not notice. Under the premise that c * i detection function, if a position different from the plurality of first hash key using the same p (key, i) =, then they will share the same detection path. Because entirely by the first detection path to determine the position, and independent of the input key. So these related key will be gathered on a detection path, which may lead to the distribution of the resulting data is not so uniform.

If we use a different detection function that makes it relevant and key input, then you can eliminate the aggregation problem. We can probe the function of the constant c into a hash function, as long as this function always returns an odd number on it, such a hash function is very easy to write.

p(key, i) = h2(key) * i
复制代码

Finally ended, readers, what you need to add it?

Guess you like

Origin blog.csdn.net/weixin_33877092/article/details/91379680