Class notes: Hash table search technology

Configuration of the hash function: direct addressing method, in addition to I stay, number analysis, middle-square method, folding process (segment superposition)
Conflict Processing Method: open-addressable, chain address method, establishing a common overflow area
Overview
Powder The basic idea of ​​the column : establish a definite correspondence between the recorded storage address and its key code. In this way, the search method of the searched element can be obtained in one reading without comparison.
Hash table : Hash technology is used to store records in a continuous storage space. This continuous storage space is called a hash table.
Hash function : A function that maps key codes to appropriate storage locations in a hash table.
Hash address : The storage location address obtained by the hash function.
Is hashing technology just a lookup technology?
Hashing is both a lookup technique and a storage technique.
Is hash a complete storage structure?
Hashing only locates the record by the key code of the record, and does not completely express the logical relationship between the records. Therefore, the hash is mainly a search-oriented storage structure.
Limitations of hash search
Hash technology is generally not suitable for the case where multiple records have the same key: there is a conflict, which reduces the efficiency of the search, and does not reflect the advantages of computational search. The
hash method is also not suitable for range search: cannot Looking for the maximum and minimum values, it is impossible to find records within a certain range.
The key issues of hashing technology: (
1) Design of hashing function. How to design a simple, uniform and high storage utilization hash function.
⑵ Handling of conflicts. How to take appropriate conflict handling methods to resolve conflicts.
conflict: For two different key codes ki ≠ kj, H (ki) = H (kj), that is, two different records need to be stored in the same storage location, ki and kj are called synonyms relative to H.
Hash function The
design of hash function should generally follow the following principles : (
1) Simple calculation. The hash function should not have a large amount of calculation, otherwise it will reduce the search efficiency. ⑵ The function value, that is, the hash address is evenly distributed. The function values ​​should be spread as evenly as possible in the address space, so as to ensure the effective use of storage space and reduce conflicts.
Hash function-direct addressing method
Hash function is a linear function of the key, that is:
H (key) = a  key + b (a, b is a constant)
applicable?
Knowing the key codes in advance, the key code set is not very large and has good continuity.
Hash function-division remainder method The
hash function is: H (key) = key mod p.
How to choose the appropriate p and produce fewer synonyms?
In general, choose p as the largest prime number less than or equal to the table length (preferably close to the table length)
.
The division remainder method is the simplest and most commonly used method for constructing hash functions, and does not require the key distribution to be known in advance.
Hash function-digital analysis method
According to the distribution of key codes on each bit, select a number of bits with a relatively uniform distribution to form a hash address.
Application?
Know the distribution of key codes in advance, and the distribution of key codes is a uniform
hash function-square method. After squaring
the key code, according to the size of the hash table, take a few bits in the middle as the hash address (cut after square).
Application?
The distribution of key codes is not known in advance and the number of key codes is not very large.
Hash function-folding method
splits the key code from left to right into parts with equal number of bits, superimposes these parts and sums them, and takes the last few bits as the hash address.
Application?
There are many key codes, and the distribution of key codes is not known in advance.
Conflict handling
Open hashing (open hashing, also known as zipper method, separate chaining, chain address method)
Closed hashing (closed hashing, also known as open addressing method, open addressing, open addressing method) to
establish a public overflow area
The method of dealing with conflicts-open addressing method
Once the hash address obtained from the key code conflicts, go to find the next empty hash address and store the record.
How to find the next empty hash address?
(1) Linear detection method (2) Secondary detection method (3) Random detection method (4) Then hash method
The hash table obtained by using the open address method to deal with the conflict is called a closed hash table .
Linear detection method
When a conflict occurs, from the next position of the conflict position, look for empty hash addresses in sequence.
For the key value, let H (key) = d, the length of the closed hash table is m, then in the event of a conflict, the formula for finding the next hash address is:
Hi = (H (key) + di)% m (di = 1, 2, ..., m-1)
Stacking : The phenomenon of contention for the same hash address between non-synonyms that occur in the process of handling conflicts.
Secondary detection
When a conflict occurs, the formula for finding the next hash address is:
Hi = (H (key) + di)% m (di = 12, -12, 22, -22, ..., q2, -q2 and q≤m / 2)
Random detection method
When a conflict occurs, the displacement of the next hash address is a random number sequence, that is, the formula for finding the next hash address is:
Hi = (H (key) + di)% m (di is a Random number sequence, i = 1, 2, ..., m-1)
The method of handling conflicts-zipper method (chain address method)
Basic idea : store all records with the same hash address, that is, records of all synonyms are stored in a single In a linked list (called a synonym sub-table), the head pointers of all synonym sub-tables are stored in the hash table.
The hash table that uses the zipper method to deal with conflicts is called an open hash table .
If n records are stored in a hash table of length m, the average length of the synonym subtable is n / m.
The method of dealing with conflicts-the
basic idea of ​​the common overflow area : the hash table contains the basic table and the overflow table (usually the overflow table and the basic table have the same size), and the conflicting records are stored in the overflow table. When searching, calculate the hash address of the given value through the hash function, first compare with the corresponding unit of the basic table, if they are equal, the search is successful; otherwise, go to the overflow table for sequential search.
Hash search performance analysis
Due to the existence of conflicts, the search after the conflict is still a process of comparing the given value with the key.
During the search process, the number of key comparisons depends on the probability of conflict. The factors that affect the conflict are:
(1) whether the hash function is uniform
(2) the method of handling the conflict
(3) the loading factor of the hash table: α = the number of records filled in the table / the length of the table
The average search length of several different methods for dealing with conflicts The
Insert picture description here
manual calculation of the average search length for successful search with equal probability is as follows:
Insert picture description here
where Ci is the number of comparisons required to place each element.
Manual calculation of the average search length rule for unsuccessful search under equal probability is as follows,
Insert picture description here
where Ci is the number of comparisons when the search is unsuccessful when the function value is i.
Comparison of open and closed hash lists
Insert picture description here

Published 48 original articles · Like 25 · Visit 2453

Guess you like

Origin blog.csdn.net/qq_43628959/article/details/104328645