One article to clarify the hash (hash) table

Introduction

Why is there a hash table? Because in many scenarios, this data structure is required by the application. For example, in the compilation stage, effective management of variables and their attributes needs to be formed, so as to determine whether some operations are legal (such as whether the data is repeatedly defined, whether the data type is Legal, etc.), for example, the comparison of some application design strings. Compared with the comparison of numbers, the cost is increased. Can the string be converted into a number and then processed?

These questions can all be answered step by step in the process of learning the hash table.

text

bedding

First review the known search methods:
Insert picture description here
here is a very interesting practical example. When logging in to QQ, how does the server quickly find your information among the huge user base and quickly check it?
If you use the static search method of dichotomy, you can get it,
Insert picture description here
so the data structure that adapts to this scenario is undoubtedly waiting for an efficient and dynamic (records can be added/deleted at any time) data structure, and then we Let’s first discuss the essence of search, which is to find his location based on the object. There are two ways of thinking:

  • Arrange objects in order, which can be total order (array, etc.) or partial order (search tree)
  • Directly "calculate" the position of the object and derive the hash

basic introduction

The basic work of the hash table:
Insert picture description here
but the complexity is related to function calculation and conflict.

Hash collision

Here is another concept of hash conflict.
Insert picture description here
What should I do if there is a conflict? Later, the conflicting element has no position. The intuitive idea is to use a two-dimensional array to add a position to store the element. But this kind of relief is limited, how to effectively solve it? This requires designing a "good" hash function. Generally meet the following conditions:
Insert picture description here

Hash function

  • Direct addressingInsert picture description here

  • Divide and leave remainder
    Insert picture description here

  • Digital analysis

For example, the last few of the ID cards are more random
Insert picture description here

  1. Folding method/square taking the middle method
    Insert picture description here

What if the keyword is a character?

  1. ASC code addition method
    Insert picture description here
    But the range is too narrow, and many hash conflicts can be improved: the
    Insert picture description here
    following is a very clever and fast calculation code, you can study:
    Insert picture description here

Conflict handling method

Usually conflicts are encountered, and there are two common ideas:

  1. Change location and save again (open address method)
    Insert picture description here

1. Linear detection

Insert picture description here
The disadvantage is that it is easy to conflict continuously, constantly testing the position, and finally accumulate more and more in the conflict place, and finally cause the phenomenon of conflicts . It can be understood from the figure below. Insert picture description here
The performance analysis of the hash table can be evaluated by the successful average search length (that is, the number of comparisons and the last found) ASLs and the unsuccessful average search length of ASLu (the number of comparisons and the last not found).

1. Square detection

Intuitive understanding can also expect a more even distribution. Observe the process of storing a series of values ​​in the hash table in the following table:
Insert picture description here

Its disadvantage is that the space may not be fully utilized, and some spaces will never be detected. But his advantage is that it alleviates the problem of conflict aggregation.

Double hash detection

Where di is the detection offset.
Insert picture description here

  1. Conflicting objects at the same location can be organized in the same singly linked list (chain address method)
    Insert picture description here

Performance analysis

Insert picture description here

application

Grasping an important point, all problems suitable for hash search require efficient search and are mostly dynamic.

1. Word frequency statistics (using hash search to quickly find and insert)

Insert picture description here
The implementation code is as follows:
Insert picture description here

2. Find out "Phone Maniac" (the mobile phone number that appears most frequently)

Insert picture description here
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_41896265/article/details/108425506