C++ study notes (two)————Hash table (Hash)

1. Hash table (hash table)

There are three main search methods in the search field, linear table search, tree search (similar to BST), and the other is hash table search, which is also called hash table.

  • Hash function
  • Hash table
  • data record

The hash table is our function. For each input key value, a unique storage address is mapped, and the address ultimately stores our data.

  • The hash function is a mapping, and the core of the design is to design a perfect hash function to minimize conflicts.
  • Hash function: returns the corresponding unique memory address for any key value for us to insert, delete and store data
  • Conflict: Just like a periodic function appears in a function, we will also encounter a situation where we may get the same memory address for different keywords. This is a conflict. The corresponding two keywords are called Synonymous.
  • Hash table: The hash table is to map a set of keywords to a continuous address set (interval) according to the hash function we set and the method we set to resolve conflicts, and the keywords are in the address. Like the storage location recorded in the table, this is called a hash table. This mapping process is also called the establishment of a hash table or hash, and the storage location we get is called a hash address or hash address.

In summary, we will find two cores of the hash table:

1. Build a hash: (Build a hash function) Create a mapping based on the hash function.

2. Dealing with conflicts: It is difficult for us to design a hash function without conflicts, so it is very necessary to deal with conflicts.

Second, the construction method of the hash table

Hashing technology is to establish a certain corresponding relationship f between the storage location of the record and its keywords. Each keyword key corresponds to a storage location f (key). When searching, find a given value according to this corresponding relationship. The key mapping f (key), if the record exists in the search set, must be at the position of f (key). We call this correspondence f a hash function, also known as a hash (Hash) function. Hashing technology is used to store records in a storage space. This continuous space is called a hash table or hash table (Hash-Table).

2.1 Direct addressing method

The direct addressing method uses the following formula

f(key)= a*key+b, a, b are constants

For example, if you count the year and month of birth, you can use f(key) = key-1990 to calculate the hash address.

Address h (key) Year of birth (key) Number (attribute)
0 1990 12.85 million
1 1991 12.81 million
2 1992 12.8 million
\cdots \cdots \cdots
10 2000 12.5 million
\cdots \cdots \cdots
21 2011 11.8 million

2.2 Divide and leave remainder method

This method is the most commonly used hash function construction method. The hash formula for a table length of m is:
f(key) = key mod p (p<=m)

  address

h(key)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Key words

   key

34 18 2 20     23 7 42   27 11   30   15  
  • Here: p=Tablesize =17
  • Generally, p is prime 

2.3 Digital analysis method

Analyze the changes of the digital keywords on each bit, and take the more random bits as the hash address. Here we use a mobile phone number as an example. The mobile phone number is a character string. Generally speaking, the last 4 digits are the real user number.

For example: take the last 4 digits of the 11-digit mobile phone number key as the address: the
hash function is: h(key)=atoi(key+7) (char*key)

2.4 Folding method

Divide the keyword into several parts with the same number of bits, and then superimpose:

Such as: 56793542

2.5 Squared method

 Such as: 56793542

Three, conflict resolution methods

Common ways to deal with conflicts:
1. Change location: open address method

2. Conflicting objects in the same location are organized together: chain address method

3.1 Open address law

Once a conflict occurs (the address already has other elements), it will be taken to find another empty address according to certain rules. If the i-th conflict occurs, the next address to be probed will increase d_{i}. The basic formula is:
h_{i}\left ( key \right )-(h\left ( key \right )+d_{i})  mod TableSize (1< =i<TableSize)

This d_{i}determines different conflict resolution schemes: linear detection, square detection, and double hashing. Here are the various methods in turn:

3.1.1 Linear detection method

The linear detection method cyclically probes the next storage address with an incremental sequence of 1, 2, ..., (TableSize-1).

[Example 1] Let the keyword sequence be 47, 7, 29, 11, 9, 84, 54, 20, 30, the hash table length

TableSize=11 (filling factor \alpha =\frac{9}{13}\approx 0.69), the hash function is: h(key)=key mod 11. Use the linear detection method to deal with the conflict, list the hash table after successive insertion, and estimate the search performance.

[Solution] The preliminary hash address is shown in the following table:

关键词(key)      47 7 29 11 9 84 54 20 30
散列地址h(key)    3 7  7  0 9  7 10  9  8

It can be seen that there are conflicts in the hash addresses of multiple keywords, see the table below for details

关键词(key)      47 7 29 11 9 84 54 20 30
散列地址h(key)    3 7  7  0 9  7 10  9  8
冲突次数            0 0  1  0 0  3  1  3  6

The specific hash table construction process can be represented by the following diagram

operating address 0 1 2 3 4 5 6 7 8 9 10 11 12 Description
Insert 47       47                   No conflict
Insert 7       47       7           No conflict
Insert 29       47       7 29         d_{1}=1
Insert 11 11     47       7 29         No conflict
Insert 9 11     47       7 29 9       No conflict
Insert 84 11     47       7 29 9 84     d_{3}=3
Insert 54 11     47       7 29 9 84 54   d_{1}=1
Insert 20 11     47       7 29 9 84 54 20 d_{3}=3
Insert 30 11 30   47       7 29 9 84 54 20 d_{6}=6

 Here is an analysis of the search performance of the hash table. There are generally two methods for the search performance of the hash table.

1. Successful average search length (ASLs)

2. Unsuccessful average search length (ASLu)

For the above question, the number of hash address conflicts is

关键词(key)      0  1  2  3  4  5  6  7  8   9  10  11  12
散列地址h(key)   11 30    47          7  28  9  84  54  20
冲突次数           0  6     0           0  1   0  3   1   3

ASLs: The average number of searches and comparisons of keywords in the lookup table (the number of conflicts plus 1)

ASLs=(1+7+1+1+2+1+4+2+4)/9=23/9\approx2.56

ASLu: Average number of searches for keywords not in the hash table (unsuccessful)

General method: divide the keywords that are not in the hash table into several categories.

Such as: according to h (key) value classification

ASLu = (3 + 2 + 1 + 2 + 1 + 1 + 1 + 9 + 8 + 7 + 6) / 11 = \approx41/11 3.73

3.1.2 Square detection method

The square detection method uses an incremental sequence 1^{2},-1^{2},2^{2},-2^{2},\cdots \cdots ,q^{2}and q< =[TableSize/2]loops to probe the next storage address. Still use [Example 1], the conflicts obtained are as follows

Key word 47 7 29 11 9 84 54 20 30
Hash address h (key) 3 7 7 0 9 7 10 9 8
Number of conflicts 0 0 1 0 0 2 0 3 3

 ASLs =(1+1+2+1+1+3+1+4+4)/9=18/9=2

operating address 0 1 2 3 4 5 6 7 8 9 10 Description
Insert 47       47               No conflict
Insert 7       47       7       No conflict
Insert 29       47       7 29     d_{1}=1
Insert 11 11     47       7 29     No conflict
Insert 9 11     47       7 29 9   No conflict
Insert 84 11     47     84 7 29 9   d_{2}=-1
Insert 54 11     47     84 7 29 9 54 No conflict
Insert 20 11   20 47     84 7 29 9 54 d_{3}=4
Insert 30 11 30 20 47     84 7 29 9 54 d_{3}=4

3.2 Chain address method 

The chain address method is to store all the conflicting keywords in the corresponding position in the same singly linked list. 

[Example 2] Let the key sequence be 47, 7, 29, 11, 16, 92, 22, 8, 3, 50, 37, 89, 94, 21, and the hash function is h(key)=key mod 11 , Use the separation link method to handle conflicts.

[One question per day]

Sum of two numbers

Given an integer array nums and a target value target, please find the two integers whose sum is the target value in the array and return their array subscripts.

You can assume that each input will only correspond to one answer. However, the same element in the array cannot be used twice.

Example:

给定 nums = [2,7,11,15],target = 9
因为 nums[0] + nums[1] = 2 + 7 =9
所以返回[0,1]

Method 1: Violent enumeration

【answer】

Ideas and methods:

The easiest way to think of is to enumerate every number x in the array to find whether target-x exists in the array.

When we use the method of traversing the entire array to find target-x, we need to note that every element before x has been matched with x, so there is no need to match. Each element cannot be used twice, so we only need to look for target-x in the element after x.

Code:

class Solution {
    public int[] twoSum(int[] nums, int target) {
        int n = nums.length;
        for (int i = 0; i < n; ++i) {
            for (int j = i + 1; j < n; ++j) {
                if (nums[i] + nums[j] == target) {
                    return new int[]{i, j};
                }
            }
        }
        return new int[0];
    }
}

Method two: hash table

Ideas and algorithms:

Note that the reason for the high time complexity of Method 1 is that the time complexity of finding target-x is too high. Therefore, we need a better way to quickly find whether there is a target element in the array. If it exists, we need to find its index.

Using a hash table, the time complexity of finding target-x can be reduced to O(N) to O(1).

In this way, we create a hash table. For each x, we first query whether target-x exists in the hash table, and then insert x into the hash table to ensure that x will not match itself.

Code:

class Solution {
    public int[] twoSum(int[] nums, int target) {
        Map<Integer, Integer> hashtable = new HashMap<Integer, Integer>();
        for (int i = 0; i < nums.length; ++i) {
            if (hashtable.containsKey(target - nums[i])) {
                return new int[]{hashtable.get(target - nums[i]), i};
            }
            hashtable.put(nums[i], i);
        }
        return new int[0];
    }
}

 

Guess you like

Origin blog.csdn.net/weixin_38452841/article/details/109079580