Algorithms and Data Structures (four) - hash table

Author: opLW
Reference: WANG Zheng teacher "Data Structures and Algorithms beauty of"
learning "data structures and algorithms beauty" some brief notes. Record a general idea, probably not very detailed. ?

table of Contents

1. Definition of the hash table
way to store the hash table 2.
3. key decision point hash table performance

4. The industrial design elements of the hash table
Specific examples of the hash table 5. Use

  • 1. Define hash table hash table from an array. It means that the hash function of the array data structure extended by the support features an array of random access according to the index element.
  • 2. way hash list stored hash table key - the value of the form.
    • Storing the array of the key into the subject method is called a hash function, the hash function is referred to as the calculation result a hash value, the array index position corresponding to the hash value stored in the value.
    • Find find the corresponding value according to the key. Find the corresponding index array based on the hash function to obtain a corresponding value.
  • 3. Decides key hash table performance
    • Design 3.1 hash function
      • 1) Basic requirements

        1. The hash value computed hash function is a non-negative integer. Because according to the results stored in an array, and the array index is a non-negative.
        2. If key1 = key2, then the hash (key1) = the hash (key2)
        3. If the key ≠ key2, the hash (key1) ≠ hash (key2 ).

      • 2) Problems ideally require different hash key corresponding to different values, but due to the limited size of the array, so inevitably produce the same hash function value is a hash conflict.
      • 3) general requirements
        • Hash values should be uniform as possible so that the value of the random hashed and evenly distributed, this will reduce the hash collision as much as possible, even after the conflict, the data in each cell is assigned to be relatively uniform.
        • The hash value calculation is simple less time consuming if too complex for even the use of hash function, it will calculate the unnecessary waste of time, and thus not worth the candle.
      • 4) A common method by direct addressing, middle-square method, folding process, the random number method.
      • 5) Examples of the java HashMap hash function
        int hash(Object key) { 
        	int h = key.hashCode()return (h ^ (h >>> 16)) & (capitity -1); //capicity表示散列表的大小
        }
        

        Highlights 1 hashCode acquisition targets after the first shift operation, and then make their own exclusive OR operation, namely: hashcode ^ (hashcode >>> 16) , this step is very clever, is shifted to the high 16 low 16 such calculated integer value of "having" high and low of nature.
        2 highlights the value of & (capitity -1) java will capitity of the HashMap is always a multiple of 2, which would make the results more uniform.

    • 3.2 Hash conflict resolution
      • 1) opening address method if a hash collision occurs, it re-detect an idle position, insert it.

        • Linear detection method
          • Inserting data when we insert data into the hash table, if a data after the hash function, the stored position has been occupied, we started from the current position, turn back look to see if there is an idle position, until you find until.
          • Find data we obtained by the hash function to find the hash value corresponding to the key element and then compares the array index of the element and the hash value to find the elements are equal, if equal, it means that we have to look elements; otherwise, the order back and then click Find. If the idle position to traverse the array has not been found, indicating that the element you are looking for is not in the hash table.
          • Delete the data in order to prevent the search algorithm fails, you can delete elements specially marked as deleted, when linear probing to find, encounter marked as deleted space, not stop, but continued to probe down.
          • Conclusion The worst time complexity is O (n)
        • Detecting secondary linear probing of each probe in steps of 1, i.e. a probe in the array a, and the second detection step into the original square.
        • Double hashing uses a set of hash functions, until the idle position so far.
        • Summary: Applicable scene amount of data stored is not very big, you can take an open address method. As ThreadLocalMap.
      • 2) Method link address when a conflict occurs, the list structure is added at a position corresponding to the current index.

        • Inserting data when inserted, we need to calculate the hash function by the corresponding hash slot, and inserted into the linked list corresponding to, the insertion of the time complexity is O (1).

        • Finding or delete data as search, deleting an element, the corresponding groove is calculated by a hash function, and then traverse the list to find or deleted. For relatively uniform hash hash function, the list of the number of nodes k = n / m, where n represents the number of data in the hash table, m represents the number of hash table slot, the time complexity is O ( k).

        • Advantage list addressing method than the open method, a higher tolerance for a large load factor.

        • Disadvantage of using the chain pointers require additional space; when the length of the list is too large, the time complexity of the query would be reduced to O (n).

        • Optimization idea when the chain length is too large, we will transform the list in the list for other efficient method of dynamic data structures, such as jump tables, red-black tree. Thus, even if a hash collision occurs, in extreme cases, all data is hashed to the same bucket, it is time to find the ultimate degenerate into a hash table is just a O (logn).

          As java1.8 version of HashMap. When the chain length reaches 8, it will use the list to replace the red-black tree. Less than 8 Shiyou changes back to the list.

    • Load factor of 3.3 / threshold
      • 1) load factor of a load factor (load factor) to represent the current list of existing hash data of the percentage of the length of the hash table. The larger load factor, indicating the less rest position, the more conflict, the hash table will decrease performance.
      • 2) threshold load factor limit.
      • 3) The purpose appears Regardless of the detection method is used, when a few idle position when the hash table, hash collision probability will be greatly improved. In order to maintain the highest operating efficiency of the hash table, under normal circumstances, we will try to ensure that the hash table in a certain percentage of idle slot. Upon reaching the load factor reaches a threshold, capacity is needed to maintain a certain proportion of free slots.
      • 4) calculated hash table populated with a load factor = the number of elements in the table / hash table
      • 5) the size of the load factor
        • When using an open address issued to resolve the conflict, the load factor is less than 1.
        • When a chain address method to resolve the conflict, may be added as the corresponding index in the list, the load factor can be greater than 1.
      • 6) the threshold setting need to weigh the time complexity and space complexity.
        • If the memory space is not tight, the implementation of high efficiency requirements, can reduce the threshold of the load factor;
        • If the memory space is tight, nor the implementation of high efficiency requirements, the threshold may increase the load factor.
      • 7) Avoid inefficient expansion
        • When capacity is needed, do not the original one-time data to a new location, you can take the idea of ​​moving in batches.
        • Batch expansion insert: When there is new data to be inserted, we insert the data into the new hash table and come up with a new data into the hash table from the old list Powder. They are each insertion repeat the above process. Such insertion operation becomes very quickly.
        • Batch query expansion: first check the new hash table, and then check the old hash table.
        • By batch mode expansion, in any case, a data insertion time complexity is O (1).
  • 4. The industrial design elements of the hash table
    • basic requirements
      • Support rapid query, insert, delete operation.
      • Reasonable memory footprint, not to waste too much space.
      • Stable performance, in extreme cases, the performance of the hash table will not be degraded to an unacceptable situation.
    • Design ideas
      • Design a suitable hash function.
      • Select the appropriate hash conflict resolution.
      • Define the load factor threshold, and design dynamic expansion strategy.
  • 5. Specific examples of the hash table
    • Word document in a word how to spell check feature is implemented?
      • Ideas strings take up memory size of 8 bytes, 200,000 words occupy memory size is less than 20MB, so use a hash table to store 200,000 word English dictionary, and then look into each word editing documents, if not found, suggesting Spelling mistakes.
    • Suppose we have 100 000 URL access log, how to sort by URL visits?
      • Ideas strings take up memory size of 8 bytes, 100 000 URL access log memory for no more than 10MB, url statistics of visits by the hash table, then the value of the element with TreeMap (hash table can be sorted) stored hash table (as a key ) and the array index value (as a value)

Love and passion, men do not bother mercy.
Should speak there is nothing wrong, the end of the text message informed me,
Should felt that I could, like the collection point to be together.

opLW original seven poems, please indicate the source

Published 21 original articles · won praise 28 · views 7319

Guess you like

Origin blog.csdn.net/qq_36518248/article/details/90903932
Recommended