Hash table theoretical basis

The internal implementation principle of the hash table

Official explanation: A hash table is a data structure that is directly accessed according to the value of the key code. (It is estimated that many people do not have an intuitive cognition)

Let's get to know the hash table from the array we are familiar with. In fact, the array is a hash table .

The key code in the hash table is the index subscript of the array, and then directly access the elements in the array through the subscript, as shown in the figure below: The
insert image description here
application of the hash table: it is generally used to quickly determine whether an element appears in the set .

Example: If we want to check whether a name is in this school.

If we enumerate, the time complexity is O(n), but using a hash table is only O(1) time complexity.

How to operate: We only need to put the names of the students in this school in the hash table, and we can directly know whether the student is in the school through the index when querying.

So how does the student name relate to the hash table?

This is where the hash function is used - mapping student names to a hash table.

hash function

The hash function directly maps the student's name to the index on the hash table, and then can quickly know whether the student is in the school by querying the index subscript.

Hash functions convert names into values through hashCode (using a specific encoding to convert other data formats into different values).
insert image description here
But when the value obtained by hashCode is larger than the size of the hash table (larger than tableSize), what should I do?

At this time, in order to ensure that the mapped index values all fall on the hash table, we perform a modulo operation on the values again to ensure that they can be mapped to the hash table.

But what if the number of students is larger than the size of the hash table? At this time, it is unavoidable that multiple classmate names are mapped to the same index subscript position at the same time. Through this case, we induce hash collisions .

hash collision

When multiple elements are mapped to the position of an index subscript, it is a hash collision .
insert image description here
There are generally two solutions: the zipper method and the linear detection method.

zipper method

As shown in the figure above, Xiao Li and Xiao Wang have a conflict at index 1, and the conflicting elements are stored in the linked list. We can find Xiao Li and Xiao Wang through the index.
insert image description here
(The size of the data is dataSize, the size of the hash table is tableSize)

This method needs to select an appropriate linked list size, so that a large amount of memory will not be wasted due to the empty value of the array, and the search time will not be increased because the linked list is too long.

linear probing

Using this method, you need to ensure that tableSize is greater than dataSize. Collisions are resolved by slots in the hash table.

Three common hash structures

array
set (collection)
map (mapping)

In C++, set and map respectively provide the following three data structures, the underlying implementation and their advantages and disadvantages are shown in the following table:
insert image description here

Use options:

To solve the hash problem, unordered_set is preferred because of its optimal query and deletion efficiency.
If the collection is required to be ordered, use set
If you require both ordered and repeated data, use multiset

Summary: When we encounter the need to quickly determine whether an element appears in the set , we must consider hashing. (Sacrificing space for time)

242. Effective anagrams

See the title first thought

Title description:
Given two strings s and t, write a function to determine whether t is an anagram of s.

Anagrams: The same number of identical letters, but not in the same order.

There is the following idea:
to judge whether a letter in a word has appeared in another word, the first thing that comes to mind is to use a hash table. Define an array to record the number of occurrences of characters in s.

And because there are 26 letters in total and the ASCII characters from a to z are also continuous , so define an array record with a size of 26 and initialize it to 0.

Traverse the string s, and perform +1 operation on the element where s[i] - 'a' is located . In this way, the number of occurrences of characters in s can be counted.

Then traverse the string t, and perform -1 operation on the value on the index of the character mapping hash table that appears in t.

Finally, check that the element of the record array is not 0 , that is, there is a difference in the number of corresponding characters between the two , and return false.

If all elements of the record array are 0 , it means that the strings s and t are anagrams, return true.

Thoughts after reading the code caprice

The idea is the same as mine

Difficulties Encountered During Implementation

No difficulties encountered

the code

class Solution {
    
    
public:
    bool isAnagram(string s, string t) {
    
    
        int record[26]={
    
    0};
        for(int i = 0;i < s.size();i++){
    
    
            record[s[i] - 'a']++;
        }
        for(int i = 0;i < t.size();i++){
    
    
            record[t[i] - 'a']--;
        }
        for(int i = 0;i < 26;i++){
    
    
            if(record[i] != 0){
    
    
                return false;
            }
        }
        return true;
    }
};

349. Intersection of Two Arrays

See the title first thought

Title description:

Given two arrays nums1 and nums2, return their intersection. Each element in the output must be unique . We can ignore the order of the output results .

Have the following ideas:

Note: Each element in the output result must be unique, that is to say, the output result is deduplicated, and the order of the output result can be ignored

Change the title to limit the size of the value, you can use the array as a hash table.

Thoughts after reading the code caprice

In the video, there are two methods of set and array. Since I have used arrays before, I mainly understand how to use set.

The application scenario of set: If the hash value is relatively small , scattered , and the span is large , using an array at this time will result in a waste of space . In this case set should be used.

It can be seen from the above that C++ provides three available data structures for set:

std::set
std::multiset
std::unordered_set

The underlying implementations of std::set and std::multiset are red-black trees, and the underlying implementation of std::unordered_set is a hash table.

Since the title does not need to sort the data , and do not allow the data to be repeated , it is the most efficient to use unordered_set to read and write at this time.

The underlying implementation of std::unordered_set is a hash table. Using unordered_set has the highest reading and writing efficiency. It does not need to sort the data, and it does not need to repeat the data, so unordered_set is chosen.

The idea is shown in the figure below:
insert image description here

Difficulties Encountered During Implementation

The right unordered_settemplate is not very familiar yet and needs to be learned.

the code

class Solution {
    
    
public:
    vector<int> intersection(vector<int>& nums1, vector<int>& nums2) {
    
    
        unordered_set<int> result_set;
        unordered_set<int> nums_set(nums1.begin(),nums1.end());
        for(int num : nums2){
    
    
            if(nums_set.find(num) != nums_set.end()){
    
    
                result_set.insert(num);
            }
        }
        return vector<int>(result_set.begin(),result_set.end());
    }
};

Question 202. Happy Numbers

See the title first thought

Title description:

Write an algorithm to determine whether a number n is a happy number.

"Happy number" is defined as: For a positive integer, replace the number with the sum of the squares of the numbers in each position each time, and then repeat this process until the number becomes 1, or it may be an infinite loop but it will never change to 1. If it can become 1, then this number is a happy number.

Return True if n is a happy number; False if not.

Have the following ideas:

Key: ** If the loop is infinite, the sum will appear repeatedly during the summing process**

Therefore, the hash method is used to judge whether the sum is repeated. If it is repeated, it returns false , otherwise, the sum is found until it is 1.

You can use unordered_set to determine whether the sum is repeated .

Thoughts after reading the code caprice

same idea

Difficulties Encountered During Implementation

I am not very familiar with the singular operation on each digit of the value

the code

class Solution {
    
    
public:
    // 取数值各个位上的单数之和
    int getSum(int n) {
    
    
        int sum = 0;
        while (n) {
    
    
            sum += (n % 10) * (n % 10);
            n /= 10;
        }
        return sum;
    }
    bool isHappy(int n) {
    
    
        unordered_set<int> set;
        while(1) {
    
    
            int sum = getSum(n);
            if (sum == 1) {
    
    
                return true;
            }
            // 如果这个sum曾经出现过，说明已经陷入了无限循环了，立刻return false
            if (set.find(sum) != set.end()) {
    
    
                return false;
            } else {
    
    
                set.insert(sum);
            }
            n = sum;
        }
    }
};

1. The sum of two numbers

See the title first thought

Title description:

Given an integer array nums and a target value target, please find the two integers whose sum is the target value in the array, and return their array subscripts.

It is difficult to solve this problem, and I don't know much about map.

Thoughts after reading the code caprice

This question needs a collection to store the elements we have traversed, and then ask this collection when traversing the array, whether an element has been traversed, that is, whether it appears in this collection.

In this question, we not only need to know whether the element has been traversed, but also know the subscript corresponding to this element . We need to use the key value structure to store, the key to store the element , and the value to store the subscript , so it is appropriate to use map .

Why use the set used before?

The size of the array is limited, and if there are few elements and the hash value is too large, memory space will be wasted.
set is a collection, and the element in it can only be a key . For the question of the sum of two numbers, it is not only necessary to judge whether y exists, but also to record the subscript position of y, because the subscripts of x and y must be returned. So set can't be used either.

The map is a key value storage structure, you can use the key to save the value, and use the value to save the subscript where the value is located.

Among the three types of map in C++, choose std::unordered_map , because the order of keys is not required in this question, and it is more efficient to choose std::unordered_map .

When using map, pay attention to the following two points:

what is map used for
What do the key and value in the map represent?

Regarding the first point, the purpose of map is to store the elements we have visited, because when traversing the array, we need to record which elements we have traversed before and the corresponding subscripts, so that we can find the one that matches the current element (that is, add equal to target)

Regarding the second point, for this question, we need to give an element, determine whether this element has appeared, and if so, return the subscript of this element.

To judge whether an element appears, this element will be used as a key, so the element in the array is used as a key, and the key corresponds to the value, and the value is used to store the subscript.

So the storage structure in the map is {key: data element, value: subscript corresponding to the array element}.

When traversing the array, you only need to query the map to see if there is a value that matches the currently traversed element . If so, it is the found matching pair. If not, put the currently traversed element into the map, because the map stores is the element we visited.

The implementation process is as follows:
insert image description here

Difficulties Encountered During Implementation

I am not very familiar with the singular operation on each digit of the value

the code

class Solution {
    
    
public:
    vector<int> twoSum(vector<int>& nums, int target) {
    
    
        std::unordered_map <int,int> map;
        for(int i = 0; i < nums.size(); i++) {
    
    
            // 遍历当前元素，并在map中寻找是否有匹配的key
            auto iter = map.find(target - nums[i]); 
            if(iter != map.end()) {
    
    
                return {
    
    iter->second, i};
            }
            // 如果没找到匹配对，就把访问过的元素和下标加入到map中
            map.insert(pair<int, int>(nums[i], i)); 
        }
        return {
    
    };
    }
};

Harvest today

1. Have an understanding of the basic theory of the hash table

2. Know the application scenarios of set and map

3. I am still not very proficient in using the template library, and I need to strengthen it in the future

Today's study time is 3 hours

The pictures in this article are all from Carl’s code caprice, and I would like to thank you very much

Code Caprice Algorithm Training Camp Sixth Day | Hash Table Theoretical Basis, 242. Effective Alphabets, 349. Intersection of Two Arrays, 202. Happy Numbers, 1. Sum of Two Numbers

Links to Articles and Videos Learned Today

Hash table theoretical basis

The internal implementation principle of the hash table

hash function

hash collision

zipper method

linear probing

Three common hash structures

242. Effective anagrams

See the title first thought

Thoughts after reading the code caprice

Difficulties Encountered During Implementation

the code

349. Intersection of Two Arrays

See the title first thought

Thoughts after reading the code caprice

Difficulties Encountered During Implementation

the code

Question 202. Happy Numbers

See the title first thought

Thoughts after reading the code caprice

Difficulties Encountered During Implementation

the code

1. The sum of two numbers

See the title first thought

Thoughts after reading the code caprice

Difficulties Encountered During Implementation

the code

Harvest today

Guess you like