Table of contents

How to construct a hash function

Direct Addressing

digital analysis

take the middle of the square

Methods for dealing with hash collisions

Open Addressing

rehash method

chain address method

Public Spill Area Act

Hashtable lookup implementation

Hash table lookup performance analysis

Hash Table Lookup (Hash Table) Overview

Hash table lookup definition

We can obtain the storage location of the required records by looking up keywords without comparison. This is a new storage technology - hashing technology.

storage_location = f (keyword)

Hashing technology: The hashing technology is to establish a definite correspondence f between the storage location of the record and its key, so that each keyword key corresponds to a storage location f ( key ). When searching, find the mapping f ( key ) of the given value key according to the determined corresponding relationship. If this record exists in the search set, it must be at the position of f ( key ).

Here we call this correspondence f a hash function , also known as a hash (Hash) function . According to this idea, hashing technology is used to store records in a continuous storage space, which is called a hash table or hash table (Hash table). Then the record storage location corresponding to the keyword is called the hash address .

Hashtable Lookup Steps

In storage, the hash address of the record is calculated by a hash function, and the record is stored according to the hash address. No matter what is recorded, we need to use the same hash function to calculate the address and store it.
When searching for a record, we calculate the hash address of the record through the same hash function, and access the record according to this hash address.

Hashing technology is both a storage method and a lookup method. However, it is different from structures such as linear tables, trees, and graphs. In the previous structures, there is a certain logical relationship between data elements, which can be represented by a line diagram, but there is no such relationship between the records of hashing technology. What logical relationship, it is only related to keywords. Thus, hashes are primarily lookup-oriented storage structures.

The problem that hashing techniques are best suited for is finding records that are equal to a given value. For search, the process of comparison is simplified, and the efficiency will be greatly improved. But everything has advantages and disadvantages, and hashing technology does not have the capabilities of many conventional data structures.

Conflict: We often encounter two keywords key₁ ≠ key₂, but f( key₁ ) is not equal to f( key₂ ), this phenomenon is called collision (collision), and key₁ and key₂ are called this hash A synonym ( synonym ) for the function.

How to construct a hash function

What counts as a good hash function:

simple calculation
Evenly distributed hash addresses

Direct Addressing

We can take a linear function of the keyword as the hash address

f ( key ) = a × key + b ( a, b are constants)

digital analysis

The digital analysis method is usually suitable for dealing with the situation where the number of keywords is relatively large. If the distribution of the keywords is known in advance and the number of digits of the keywords is evenly distributed, this method can be considered.

take the middle of the square

Assuming that the key is 1234, then its square is 1522756, and then extracting the middle 3 bits is 227, which is used as a hash address. The method of taking the middle of the square is more suitable for situations where the keyword distribution is not known and the number of digits is not very large.

folding method

The folding method is to divide the keyword into several parts with equal digits from left to right (note that the last part can be shorter if the number of digits is not enough), and then superimpose and sum these parts, and according to the length of the hash table, take the last few bits as a hash address.

For example, our keyword is 9876543210, and the length of the hash table is three digits. We divide it into four groups, 987|654|321|0, and then add them up to 987+654+321+0=1962, and then find The last 3 bits get the hash address 962.

The folding method does not need to know the distribution of keywords in advance, and is suitable for situations where there are many keywords.

remainder method

This method is the most commonly used method for constructing hash functions. The formula for the hash function with the length of the hash table is:

f ( key ) = key mod p ( p ≤ m )

mod is modulo (remainder). In fact, this method can not only directly take the modulus of the keyword, but also take the modulus after folding and squaring.

random number method

Choose a random number, and take the random function value of the keyword as its hash address. That is f ( key ) = random ( key ). Here random is a random function. When the lengths of keywords are not equal, it is more appropriate to use this method to construct a hash function.

Summarize

In reality, different hash functions should be used in different situations. A reference can be provided based on a number of considerations:

The time required to calculate the hash address
keyword length
hash table size
Distribution of keywords
Frequency of record lookups

Methods for dealing with hash collisions

Open Addressing

The open addressing method is to find the next empty hash address once a conflict occurs. As long as the hash table is large enough, the empty hash address can always be found and the record will be stored.

fᵢ ( key ) = ( f ( key ) + dᵢ ) MOD m ( dᵢ = 1,2,3, … , m-1 )

We call this open addressing method for conflict resolution linear probing.

Conflict: We call the situation where an address that is not a synonym but needs to compete for an address, we call this phenomenon accumulation .

rehash method

For the hash table, we prepare multiple hash functions in advance, and RHᵢ are different hash functions.

fᵢ ( key ) = RHᵢ ( key ) ( i = 1,2,3, … , k )

chain address method

Store all the records whose keywords are synonyms in a singly linked list, we call this table a synonym sub-list, and only store the header pointers of all synonym sub-lists in the hash table.

Public Spill Area Act

A common overflow area is established for all conflicting keywords to store. When searching, after calculating the hash address of the given value through the hash function, first compare it with the corresponding position in the basic table. If they are equal, the search is successful; if they are not equal, the overflow table will be searched sequentially. If there are few conflicting data compared with the basic table, the structure of the public overflow area is still very high for search performance.

Hashtable lookup implementation

Hash table lookup performance analysis

Is the hash table uniform
Ways to handle conflict
The fill factor of the hash table

Data Structure_Hash Table Lookup (Hash Table) Overview

Hash Table Lookup (Hash Table) Overview

Hash table lookup definition

Hashtable Lookup Steps

How to construct a hash function

Direct Addressing

digital analysis

take the middle of the square

folding method

remainder method

random number method

Summarize

Methods for dealing with hash collisions

Open Addressing

rehash method

chain address method

Public Spill Area Act

Hashtable lookup implementation

Hash table lookup performance analysis

Guess you like