Hash table summary (personal interview orientation -> there may be errors)

All my articles are personal opinions, I hope you can look for other articles after reading them, so as not to make mistakes

The Hash table is a data structure, which can be understood as a mapping before the value is inserted into the hash table.

The more official statement is: a hash table ( Hash  Table, also called a hash table) is a data structure that is directly accessed according to the key value (key value) . It accesses records by mapping a key value to a location in the table to speed up access. This mapping function is called a hash function , and the stored array is called a hash table .

I personally think that the inserted value key needs to be mapped once before entering the hash table. The mapping process is generally a modulus, and then put it into the hash table. At this time, the position may be occupied by others and you need to use the hash conflict. After the hash conflict is over, if it is normal, it can be stored. If the abnormal space is not enough, the expansion mechanism needs to be triggered.

Therefore, this introduction is divided into

1. Hash table insertion

2. Hash collision

3. Hash expansion

4. Hash lookup

5. Hash delete

Hash table insertion

The data inserted into the hash table is key, and the type of key can be numbers, letters, strings, etc., as well as the data contained in different objects that were asked in the interview. In fact, these problems can be solved with one way of thinking

Storing numbers is very simple. You only need to determine the position of the data after taking the modulus. This position needs to be judged whether the conflict changes again.

If you are storing letters, you will face a problem. The relationship between the stored data is based on 26 letters, regardless of case. Then subtracting an a means that az is 0-25. Simple mapping can be done. If it is a mixed case letter, it is a bit complicated, so it is generally considered to turn it into one

The storage of strings is a relatively common use, and the storage of some relatively long data is as follows

 You can store digits according to the number of layers, for example, the first layer is the first letter, etc., but if you do this, it may require a very large space, because the first layer is 26, and the second layer is 26*26, so you need to use a linked list to save space, just like the hashmap of C#, and the stored data may be more complicated. How to map this time, we need to talk about a concept, which is the hash function

As the name implies , the hash function is a function to determine the hash mapping method, because if the stored data is more complicated, it can be set with several prescribed methods, so that it is not necessary to rewrite it when developing, and the methods are divided into several categories

The direct valuation method        directly uses the data as the key

The digital analysis method        is used as a specific keyword according to the data situation, such as the time 1999.3.26 is allocated one by one

The square method        takes the middle digits after the square of the keyword as the hash address

The folding method        divides the keyword into several parts with the same number of digits, and then takes the superposition sum of these parts (rounded up) as the hash address. This method is called the folding method.

In addition to the more common method of taking the remainder,         the value obtained after taking the remainder of the keyword is the hash address

The random number method        selects a random function, and takes the random function value of the keyword as the hash address. The method is as follows:

Step1. Take out the keyword key of a data element, and calculate its storage address D=H(key) in the hash table. If the storage space with the storage address D has not been occupied, then store the data element; otherwise, a conflict occurs, and Step 2 is executed.

Step2. According to the specified conflict handling method, calculate a storage address under the data element whose key is key. If the storage space of the storage address is not occupied, store it in; otherwise, continue to execute Step 2 until a storage address with no storage space is found.

One thing to pay attention to when inserting a hash table is to determine whether there is a suitable position before considering inserting

In fact, I often ignore the hash table (hash table) during the interview, which means that the value is stored after mapping (hash function processing). As a result, I have no concept of font and string storage during the interview, which needs attention.

hash collision

Hash collision is the problem that two different values ​​exist in the same position after the hash function processes the mapping relationship.

The method used is:

The zipper method        uses a linked list to store conflicts and does not need to go elsewhere

The multi-hashing method        designs multiple hash tables and re-hashes after mapping to the same address. This method is more useful for storing alphabetic Chinese or some special numbers. It may be tested during the interview.

Open address method        The open address is to use the calculation formula to calculate the next step to Hi=(H(key)+di) MOD mi=1,2,...,k, with linear detection, d is gradually increased by 1 until a vacancy is found, square detection d may be integer or negative, and random detection d is an arbitrary value

Hash expansion

Hash expansion is a mechanism that will only be carried out when the hash table finds that it can no longer store data. Of course, we will not consider expansion when it is full, but when it reaches a certain percentage (the load factor is generally 0.75, which is 75%). The expansion method is divided into two parts

  1. Create a new array, create a new array with twice the length newCap.
  2. After traversing oldTab to take out the key of the element, re-hash, calculate index=i = (newCap - 1) & hash
  3. Reinsert the element.

The size is generally twice the original size. If it is too small, it will consume performance frequently. If it is too large, it may not be used. We will talk about it later.

hash lookup

The search method is to use a hash function to search for data, which is a bit like hash insertion. The method of hash insertion is to find an empty address first and then insert it after confirming that there is a suitable address. The time complexity of the search is very small, and no loop detection is required. However, if the data is obtained through hash conflicts, it is necessary to search for the address of the conflict result.

hash delete

Hash deletion is mainly aimed at two types, one is linear detection, and the other is open chain method, because the two structures need to be processed separately

1. Linear detection: The storage is mainly in the array, and the deleted data needs to be given a mark on the position, indicating that it is free but occupied, because the deleted position may have been used to calculate the hash collision before.

If you want to store k1, 2, 3, three numbers but all three numbers are mapped to the same address, if you delete a k2 at this time, look for k3

 At this time, if k3 finds that the position of k2 is empty during the search, and returns a search failure, then this search is wrong, so a mark needs to be added to the position of k2 to indicate that it has been occupied before to avoid misjudgment during the search. You can click here to view the corresponding video

Another structure is the open chain method, that is, use an array to store the head node, and then connect the corresponding linked list. If there is a conflict in this method, the corresponding sub-linked list will be deleted directly, but if there is no conflict (that is, the array address has only one data linked list, delete it and add a mark to the array address

epilogue

Questions about hash tables are frequently asked during interviews, but the answers are ambiguous every time. Therefore, I wrote this article specifically. There may be some mistakes, but I will write it down first and I will come back to modify it. I used to think that hash tables are just a structure for storing data after mapping. However, my incomplete understanding after many interviews has taught me a lot of lessons. I hope you can dig more.

Guess you like

Origin blog.csdn.net/hoxidohanabi/article/details/127964348