[Data structure review] Hash table

All pictures are from the PPT of Data Structure and Algorithm Foundation (Qingdao University-Wang Zhuo) . By the way, Amway, the teacher's lecture is very good, and the wall is recommended!

Hash table

1. Concept

1. Definition of hash table

The hash table is a data structure that is directly accessed based on the key code value . In other words, it accesses the record by mapping the key code value to a location in the table to speed up the search. This mapping function is called a hash function, and the array storing records is called a hash table .

Hash function: H(key) = Addr.

Example:
Insert picture description here
As above, given the hash function H(key) = key, if the given key = 9, the address calculated by the hash function is 9, that is, 9 should exist in the 9th position of the hash table.

2. Related concepts of hash tables

  • Conflict: different key codes are mapped to the same hash address. That is, key1 != key2, but H(key1) == H(key2).
  • Synonym: keywords with the same hash function value.

Second, the construction method of the hash function

Commonly used hash functions:
Insert picture description here

1. Direct addressing method

Insert picture description here

2. In addition to the remainder method

Insert picture description here
p generally takes the length of the hash table.

3. Other

  • Numerical analysis: Analyze a set of data, such as the date of birth of a group of employees. At this time, we find that the first few digits of the date of birth are roughly the same. In this case, the probability of conflict will be very high, but we find The last few digits of the year, month, and day indicate the number of months and specific dates. If the following digits are used to form a hash address, the probability of conflict will be significantly reduced. Therefore, the digital analysis method is to find out the laws of numbers, and use these data as much as possible to construct a hash address with a lower probability of conflict.
  • Squaring the middle method: Take the middle digits after the keyword is squared as the hash address.
  • Folding method: divide the keyword into several parts with the same number of bits, and the last part of the number of bits can be different, and then take the superposition and (remove the carry) of these parts as the hash address.
  • Random number method: Choose a random function, take the keyword as the seed of the random function to generate a random value as the hash address, usually used in occasions with different keyword lengths.

3. Methods of handling conflicts

Insert picture description here

1. Open address method (open address method)

The basic idea: When there is a conflict, find the next empty hash address . As long as the hash table is large enough, the empty hash address can always be found and the data elements are stored.
Insert picture description here

2. Chain address method (zipper method)

The basic idea: the records of the same hash address are chained into a singly linked list, m hash addresses are m singly linked lists, and then an array is used to store the head pointers of the m singly linked lists to form a dynamic structure.

Example:
Insert picture description here
Steps to build a hash table in the chain address method:
Insert picture description here

Fourth, the lookup of the hash table

Insert picture description here
Note: ASL is the abbreviation of the average search length when the search algorithm succeeds. It is the expected value of the number of keywords that need to be compared with the given value in order to determine the position of the record in the lookup table.
Insert picture description here
Insert picture description here

Guess you like

Origin blog.csdn.net/qq_43424037/article/details/113727318