Introduction
Why is there a hash table? Because in many scenarios, this data structure is required by the application. For example, in the compilation stage, effective management of variables and their attributes needs to be formed, so as to determine whether some operations are legal (such as whether the data is repeatedly defined, whether the data type is Legal, etc.), for example, the comparison of some application design strings. Compared with the comparison of numbers, the cost is increased. Can the string be converted into a number and then processed?
These questions can all be answered step by step in the process of learning the hash table.
text
bedding
First review the known search methods:
here is a very interesting practical example. When logging in to QQ, how does the server quickly find your information among the huge user base and quickly check it?
If you use the static search method of dichotomy, you can get it,
so the data structure that adapts to this scenario is undoubtedly waiting for an efficient and dynamic (records can be added/deleted at any time) data structure, and then we Let’s first discuss the essence of search, which is to find his location based on the object. There are two ways of thinking:
- Arrange objects in order, which can be total order (array, etc.) or partial order (search tree)
- Directly "calculate" the position of the object and derive the hash
basic introduction
The basic work of the hash table:
but the complexity is related to function calculation and conflict.
Hash collision
Here is another concept of hash conflict.
What should I do if there is a conflict? Later, the conflicting element has no position. The intuitive idea is to use a two-dimensional array to add a position to store the element. But this kind of relief is limited, how to effectively solve it? This requires designing a "good" hash function. Generally meet the following conditions:
Hash function
-
Direct addressing
-
Divide and leave remainder
-
Digital analysis
For example, the last few of the ID cards are more random
- Folding method/square taking the middle method
What if the keyword is a character?
- ASC code addition method
But the range is too narrow, and many hash conflicts can be improved: the
following is a very clever and fast calculation code, you can study:
Conflict handling method
Usually conflicts are encountered, and there are two common ideas:
- Change location and save again (open address method)
1. Linear detection
The disadvantage is that it is easy to conflict continuously, constantly testing the position, and finally accumulate more and more in the conflict place, and finally cause the phenomenon of conflicts . It can be understood from the figure below.
The performance analysis of the hash table can be evaluated by the successful average search length (that is, the number of comparisons and the last found) ASLs and the unsuccessful average search length of ASLu (the number of comparisons and the last not found).
1. Square detection
Intuitive understanding can also expect a more even distribution. Observe the process of storing a series of values in the hash table in the following table:
Its disadvantage is that the space may not be fully utilized, and some spaces will never be detected. But his advantage is that it alleviates the problem of conflict aggregation.
Double hash detection
Where di is the detection offset.
- Conflicting objects at the same location can be organized in the same singly linked list (chain address method)
Performance analysis
application
Grasping an important point, all problems suitable for hash search require efficient search and are mostly dynamic.
1. Word frequency statistics (using hash search to quickly find and insert)
The implementation code is as follows:
2. Find out "Phone Maniac" (the mobile phone number that appears most frequently)