Four Solutions to Hash Conflicts

1) Introduction to Hash Tables

The characteristics of non-hash table: there is no definite relationship between the position of the keyword in the table and it, the search process is to compare the given value with each keyword once, and the efficiency of the search depends on the given value. the number of comparisons.

    The characteristics of the hash table: there is a definite relationship between the position of the key in the table and it.

Hash function: In general, it is necessary to establish a functional relationship between the keyword and its storage location in the table. The location of the record in the table with f(key) as the key is usually called the function f (key) is a hash function.

Hash : Translated as "hash", it is to convert an input of any length into a fixed-length output through a hashing algorithm, and the output is the hash value.

           This transformation is a compression map . The space of the hash value is usually much smaller than the space of the input. Different inputs may hash to the same output, so it is impossible to uniquely determine the input value from the hash value.

           Simply put, it is a function that compresses a message of any length into a message digest of Moi's fixed length.

Hash conflict: (written by the master himself) is to store the current key value key-value pair according to the key, that is, the result obtained by a function f(key) as the address (this is the way of storing the value of the hashmap), but found that Someone at the calculated address already came first. I mean, this place is going to be crowded. This is the so-called hash conflict.

2) How the hash function handles collisions

1) Open addressing method:

where m is the length of the table

There are three ways to take the increment di:

Linear probing rehashing di = 1 , 2 , 3 , … , m-1

Square Probe Rehashing di = 1 2 , -12 , 22 , -22 , 32 , -32 , … , k2 , -k2

( Brother's note : Doan, the square detection and hashing above is the square of plus 1; minus the square of 1, plus the square of 2, minus the square of 2, plus the square of 3, minus the square of 3... plus k The square of , minus the square of k. Lie wipe, teacher, can you make some more mistakes? Law. If you look directly at how the squared detection and hashed di came from, you may not be able to understand the writing of the teacher's ppt. The meaning of square. The red word above is equivalent to the teacher's ppt, which is viewed together with the above picture.)

Random probing and rehashing di is a series of pseudorandom numbers

example:


The red 12 I wrote at the bottom of the picture above , when I tested it, I didn't know this 12, which is the origin of the increment di above. I don't know, the limit is known, that's 1 to the power of 2. . . . The teacher is too lazy or will not mark the numbers.


2) Chain address method

The above is just the teacher's ppt, and the following is the test that I have done myself.


First, according to the hash algorithm on ppt: h(key) = key % 7, calculate the corresponding hash value. This hash value is temporarily determined. The current value is stored in the position of the array.
After all the calculations, you can, according to this hash value, in turn, put these numbers on the following array. Then there is this screenshot of myself.
It is consistent with the calculation of the ppt above.

This approach is that Java's HashMap is implemented in this way. Under a simple explanation, the linked list generation mechanism of this HashMap source code.
In the put() method, the last part has the following call.
addEntry(hash, key, value, i);
Explain the meaning of the following parameters:
1, hash: a value calculated according to the key, the source code is like this –int hash = hash(key);,
this is calculated This is equivalent to an ID card number, which can uniquely identify a person. The only one to determine this map
2, key: key is the key when we put the key-value pair into the hashmap. When using the map, it is not possible to get it according to the key. value?
3, value: This is the same as above, which is the value of the stored key-value pair.
4, i: The source code is like this – int i = indexFor(hash, table.length); the actual meaning is that this key-value pair is stored in the index subscript of the underlying array.
Then this i can correspond to the value after the modulo on the ppt, that is, determine the subscript on the array.

Although there may be a problem of capacity expansion during put, we will not consider this here, but only consider how to generate a linked list and the order of key-value pairs on the linked list.
createEntry(hash, key, value, bucketIndex);
This method is really creating a node to the array.
These parameters are the same and have the same meaning as explained above.

  1. //First remove the original value from the array, put it into the new node, and then put the new node on the array.  
  2. //That's the reason behind come from behind. There is something wrong with the drawing on the ppt.  
  3. //Teachers, they are just eating, and generally don't care about this stuff.  
  4.    void createEntry(int hash, K key, V value, int bucketIndex) {  
  5.        Entry<K,V> e = table[bucketIndex];  
  6.        table[bucketIndex] = new Entry<>(hash, key, value, e);  
  7.        size++;  
  8.    }  
   //First remove the original value from the array, put it into the new node, and then put the new node on the array.
    //That's the reason behind come from behind. There is something wrong with the drawing on the ppt.
    //Teachers, they just eat a meal, and generally don't care about this stuff.
    void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }
  1.   staticclass Entry<K,V> implements Map.Entry<K,V> {   
  2.       final K key;  
  3.       V value;  
  4.       Entry<K,V> next;  
  5.       int hash;  
  6.       /** 
  7.        * Creates new entry. 
  8.        */  
  9.       Entry(int h, K k, V v, Entry<K,V> n) {  
  10.           value = v;  
  11.           next = n;  
  12.           key = k;  
  13.           hash = h;  
  14.       }  
  15. //******  
    static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;
        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }
        //******
    }

The above is the model of the elements stored on the underlying array of the hashmap. It is also the key to forming a linked list. If you are interested, you can look at the source code of the hashmap of 1.7.

3, 4) Re-hash, establish a public overflow area

3. The re-hash method means that there is more than one way to calculate the hashcode. If one is calculated and repeated, another algorithm is used to calculate it. Anyway, a lot, until it is not repeated. Big Brother guessed

4. To establish a common overflow area is to put all the conflicting items in another place, not in the table. I don't know the specific implementation, it's also a guess by the senior brother.

To sum it up are the following four lines:
1. Open addressing method (linear detection and re-hashing, secondary detection and re-hashing, pseudo-random detection and re-hashing)
2. Re-hashing method
3. Chain address method (this is what Java hashmap does)
4. Establish a public overflow area


Seeing this, I still have to calm down and take a look at the source code of the hashmap. The 1.7 is easy to understand. I also made a comment, you can take a look. The link is as follows

Java 

Jawa 1.8's hashmap comprehension connection

The understanding of hashmap in Java 1.7 has more red-black trees.



1) Introduction to Hash Tables

The characteristics of non-hash table: there is no definite relationship between the position of the keyword in the table and it, the search process is to compare the given value with each keyword once, and the efficiency of the search depends on the given value. the number of comparisons.

    The characteristics of the hash table: there is a definite relationship between the position of the key in the table and it.

Hash function: In general, it is necessary to establish a functional relationship between the keyword and its storage location in the table. The location of the record in the table with f(key) as the key is usually called the function f (key) is a hash function.

Hash : Translated as "hash", it is to convert an input of any length into a fixed-length output through a hashing algorithm, and the output is the hash value.

           This transformation is a compression map . The space of the hash value is usually much smaller than the space of the input. Different inputs may hash to the same output, so it is impossible to uniquely determine the input value from the hash value.

           Simply put, it is a function that compresses a message of any length into a message digest of Moi's fixed length.

Hash conflict: (written by the master himself) is to store the current key value key-value pair according to the key, that is, the result obtained by a function f(key) as the address (this is the way of storing the value of the hashmap), but found that Someone at the calculated address already came first. I mean, this place is going to be crowded. This is the so-called hash conflict.

2) How the hash function handles collisions

1) Open addressing method:

where m is the length of the table

There are three ways to take the increment di:

Linear probing rehashing di = 1 , 2 , 3 , … , m-1

Square Probe Rehashing di = 1 2 , -12 , 22 , -22 , 32 , -32 , … , k2 , -k2

( Brother's note : Doan, the square detection and hashing above is the square of plus 1; minus the square of 1, plus the square of 2, minus the square of 2, plus the square of 3, minus the square of 3... plus k The square of , minus the square of k. Lie wipe, teacher, can you make some more mistakes? Law. If you look directly at how the squared detection and hashed di came from, you may not be able to understand the writing of the teacher's ppt. The meaning of square. The red word above is equivalent to the teacher's ppt, which is viewed together with the above picture.)

Random probing and rehashing di is a series of pseudorandom numbers

example:


The red 12 I wrote at the bottom of the picture above , when I tested it, I didn't know this 12, which is the origin of the increment di above. I don't know, the limit is known, that's 1 to the power of 2. . . . The teacher is too lazy or will not mark the numbers.


2) Chain address method

The above is just the teacher's ppt, and the following is the test that I have done myself.


First, according to the hash algorithm on ppt: h(key) = key % 7, calculate the corresponding hash value. This hash value is temporarily determined. The current value is stored in the position of the array.
After all the calculations, you can, according to this hash value, in turn, put these numbers on the following array. Then there is this screenshot of myself.
It is consistent with the calculation of the ppt above.

This approach is that Java's HashMap is implemented in this way. Under a simple explanation, the linked list generation mechanism of this HashMap source code.
In the put() method, the last part has the following call.
addEntry(hash, key, value, i);
Explain the meaning of the following parameters:
1, hash: a value calculated according to the key, the source code is like this –int hash = hash(key);,
this is calculated This is equivalent to an ID card number, which can uniquely identify a person. The only one to determine this map
2, key: key is the key when we put the key-value pair into the hashmap. When using the map, it is not possible to get it according to the key. value?
3, value: This is the same as above, which is the value of the stored key-value pair.
4, i: The source code is like this – int i = indexFor(hash, table.length); the actual meaning is that this key-value pair is stored in the index subscript of the underlying array.
Then this i can correspond to the value after the modulo on the ppt, that is, determine the subscript on the array.

Although there may be a problem of capacity expansion during put, we will not consider this here, but only consider how to generate a linked list and the order of key-value pairs on the linked list.
createEntry(hash, key, value, bucketIndex);
This method is really creating a node to the array.
These parameters are the same and have the same meaning as explained above.

  1. //First remove the original value from the array, put it into the new node, and then put the new node on the array.  
  2. //That's the reason behind come from behind. There is something wrong with the drawing on the ppt.  
  3. //Teachers, they are just eating, and generally don't care about this stuff.  
  4.    void createEntry(int hash, K key, V value, int bucketIndex) {  
  5.        Entry<K,V> e = table[bucketIndex];  
  6.        table[bucketIndex] = new Entry<>(hash, key, value, e);  
  7.        size++;  
  8.    }  
   //First remove the original value from the array, put it into the new node, and then put the new node on the array.
    //That's the reason behind come from behind. There is something wrong with the drawing on the ppt.
    //Teachers, they just eat a meal, and generally don't care about this stuff.
    void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }
  1.   staticclass Entry<K,V> implements Map.Entry<K,V> {   
  2.       final K key;  
  3.       V value;  
  4.       Entry<K,V> next;  
  5.       int hash;  
  6.       /** 
  7.        * Creates new entry. 
  8.        */  
  9.       Entry(int h, K k, V v, Entry<K,V> n) {  
  10.           value = v;  
  11.           next = n;  
  12.           key = k;  
  13.           hash = h;  
  14.       }  
  15. //******  
    static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;
        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }
        //******
    }

The above is the model of the elements stored on the underlying array of the hashmap. It is also the key to forming a linked list. If you are interested, you can look at the source code of the hashmap of 1.7.

3, 4) Re-hash, establish a public overflow area

3. The re-hash method means that there is more than one way to calculate the hashcode. If one is calculated and repeated, another algorithm is used to calculate it. Anyway, a lot, until it is not repeated. Big Brother guessed

4. To establish a common overflow area is to put all the conflicting items in another place, not in the table. I don't know the specific implementation, it's also a guess by the senior brother.

To sum it up are the following four lines:
1. Open addressing method (linear detection and re-hashing, secondary detection and re-hashing, pseudo-random detection and re-hashing)
2. Re-hashing method
3. Chain address method (this is what Java hashmap does)
4. Establish a public overflow area


Seeing this, I still have to calm down and take a look at the source code of the hashmap. The 1.7 is easy to understand. I also made a comment, you can take a look. The link is as follows

Java 

Jawa 1.8's hashmap comprehension connection

The understanding of hashmap in Java 1.7 has more red-black trees.



Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326946066&siteId=291194637