[] Data structure of a hash table data structure of a hash (hash) table

 


Hash table (Hash Table) Basic Concepts

Hash table (Hash Table) in accordance with a key (Key value) to directly access the memory storage location of the data structure .
By the hash table and establish some kind of mapping between the keywords where the data elements and data elements, the mapping function called a hash function , data storage array is called a hash table .

 

Hash function construction

Constructor hash table is:

  Suppose the number of data elements to be stored is n, the length of a set of contiguous memory locations m (m≥n) respectively to each data element key Ki (0 <= i <= n-1) as arguments , by the hash function hash (Ki) Ki mapped to an address hash (ki) of the memory cell, and the data element stored in the memory unit. From a mathematical point of view, the hash function is actually mapped memory unit to a keyword, we want to try to use a hash function through a simple operation, so that the calculated hash by the hash function address is mapped uniformly as possible a series of memory cells.

Constructor hash function has three main points:

First, the operation process is simple and efficient to try to improve the efficiency of insertion and retrieval of the hash table;
second, hash hash function should have good resistance to reduce the probability of a hash collision;
third, hash function It should have greater compression to save memory.

Several common methods:

1) direct addressing method
which is a linear function takes a hash value of the keyword address. It can be simply expressed as:
the hash (K) + C = the aK
advantage of not conflict, but the drawback is the spatial complexity may be high for the case where fewer elements;

2) middle-square method
if each one has some key figures of the high frequency phenomenon repeated, it can keyword and squared by the square of the difference in the expansion, then take the middle digit as the final memory address.
Examples using
such key = 1234 1234 ^ 2 = 1522756 taken as the hash address 227
such as key = 4321 4321 ^ 2 = 18671041 671 taken as the hash address
this method is suitable for the data not known in advance and the case of a small data length

3) In addition I stay
it is a key element of the data obtained is divided by a constant as a hash address, the method is simple, wide application, a hash function of the most frequently used, it can be expressed as:
hash (K) = K mod C (C <= m) m is a long table
key of this method is to select constants, generally require close or equal to the hash table itself length, theoretical studies have shown that the constants take prime effect the best.

 In practice, not all our key figures, there may be strings, there may be a combination of several values, so we need to implement their own hash function.

1. positive integer

Obtaining a positive integer hash value of the most commonly used method is to use I stay removed. I.e. to the size of the array M is a prime number, for any positive integer k, I k is calculated by dividing the number of M. M generally the prime number.

2. String

When the strings as keys, we can also him as a big integer, in addition to more than the use of retained law. We can each character string is then hashed value, such as

public int GetHashCode(string str)
{
    char[] s = str.ToCharArray();
    int hash = 0;
    for (int i = 0; i < s.Length; i++)
    {
        hash = s[i] + (31 * hash); 
    }
    return hash;
}

The above method of Horner calculated hash value is a hash value string, the formula is:

   h = s[0] · 31L–1 + … + s[L – 3] · 312 + s[L – 2] · 311 + s[L – 1] · 310

For example, such as to obtain "call" the hash value c corresponding unicode character string is 99, a corresponding unicode to 97, L is the corresponding unicode 108, the character string "call" the hash value of 3,045,982 = 99 · 31 . 3  + 97 31 * 2  + 108 · 31 . 1  + 108 * 31 = 31 · 108 + (108 + 31 · (31 · 97 + (99)))

If you go to a hash value may be more time-consuming for each character, it is possible to obtain the hash value by taking N-character spacing to save time, for example, can get 8-9 per character to get the hash value:

public int GetHashCode(string str)
{
    char[] s = str.ToCharArray();
    int hash = 0;
    int skip = Math.Max(1, s.Length / 8);
    for (int i = 0; i < s.Length; i+=skip)
    {
        hash = s[i] + (31 * hash);
    }
    return hash;
}

Hash collision

When constructing a hash table, there is a problem, for two different keywords, calculate the hash address by our hash function Shique got the same hash address, we will be a phenomenon known as hash collision (as):
for example, we use the other method when the remainder left for the key: 3,6,9, 3 mod 3 == 6 mod 3 == 9 mod 3 = 0, this case have taken place hash 3,6,9 conflict.

 

 

 

 


Hash conflict resolution

1. Open a custom method
2. The chain address law

Open custom law

First there is a H (key) hash function
if H (key1) = H (keyi )
then a storage location keyi

Hi = (H (key) + di) MODm (m table length)
DI three emulated
1) linear probing re-hash
DI = C I *
2) and then detecting the square hashing
d_i = 1 ^ 2, - 2 ^ 2,2 ^ 1, ^ 2 -2
3) random hash probe (dual probe re-hash)
DI is a set of pseudo-random number sequence

Note that
the increment should have the following characteristics di (completeness): Hi (address) generated are not the same, and the resultant s (m-1) th Hi cover all addresses in the hash table

    • When the prime number length m square meter probe must 4j + 3 (the square of the probe table length limited)
    • M random probing no common factors and di (di limited random probing)

There is a set of data
. 19 0,123,145,568,118,637 to be stored in an array length of the table 11, where H (key) = key MOD 11
the method according to the above three solutions conflict, stored as follows:
(table explains: front inserting data back, if the insertion position is already occupied, a collision occurs, a separate line of conflict, calculated address until the address is available, conflicts continue down behind a separate line. take the uppermost end result data (because it is the most "accounted Block "data))

 

 

 

 

 

Chain address method

When the principle of chain address law if they conflict, he will create a new space at the same address, and then inserted into the space as a linked list of nodes.

 

 

refer:

Hash table algorithm principle

Data structure of a hash (hash) table

 

Guess you like

Origin www.cnblogs.com/-wenli/p/11703385.html