Hash tables and unordered containers

1. Hash table

Ⅰ Concept

        A hash table, also called a hash table, is a data structure that provides fast insertion and search operations. No matter how many pieces of data are in the hash table, the time complexity of insertion and search is O(1) , because the lookup speed of the hash table is very fast, so the hash table is used in many programs.

Ⅱ Features

        The hash table is based on an array , and the expansion cost after the array is created is relatively high , so when the hash table is full, the performance drops seriously.

        A hash table is somewhat similar to a dictionary. Through a hash function, the "key" or "keyword" is converted into an array subscript. But different "keys" or "keywords" may generate the same address, so there is a hash conflict. There are two commonly used methods to solve hash conflicts: "open address method" and "linked list method".

        Among them, the open address method is to fill in the position at the back of the array when the address is found to be occupied.

        The linked list rule is to maintain a linked list in the array, and the keywords that generate the same address are added after the linked list.

as the picture shows:

2. Unordered containers

        When the hash operation needs to be used in practice, it can be realized directly by using unordered containers in c++. The new standard defines four unordered associative containers.     

         Unordered containers are essentially organized in storage as a set of buckets , each holding zero or more elements. Elements are mapped to buckets via a hash function. In order to access an element, the container first computes a hash value indicating the search bucket. Search buckets keep all elements with the same hash in the same bucket. If the container allows duplicate keys, all elements with the same key will also be in the same bucket. So the unordered container performance depends on the quality of the hash function and the number and size of buckets. Computing element hashes and searching in buckets are generally fast, unless a bucket holds many elements, and finding a specific element requires a lot of comparison operations.

Related functions of bucket management:

//桶接口
c.bucket_count();			//正在使用的桶数目
c.max_bucket_count();		//容器能容纳的最多的桶的数量
c.bucket_size(n);			//第n个桶有多少元素
c.bucket(k);				//关键字k的元素在哪个桶
//桶迭代
local_iterator				//用来访问桶中元素的迭代器类型(equal_range()的返回值类型)
const_local_iterator		//const版本
c.begin(n),c.end(n)			//桶n的首元素迭代器和尾后迭代器
c.cbegin(n),c.cend(n)		//const
//哈希策略
h = c.hash_function();		//返回c的哈希函数
eq = c.key_eq();			//eq是c的相等检测函数
c.load_factor();			//装载因子。元素除以桶数,double(c.size())/c.bucket_count()每个桶平均元素数量,float
c.max_load_factor();		//最大装载因子。c试图维护的平均桶大小,返回float。
//以下函数代价可能会触发重新hash所有元素,代价可能非常高,最坏情况O(N^2)
c.max_load_factor(d);		//输入参数float d,将最大装载因子设定为d,若装载因子已接近最大,c将改变哈希表大小
c.rehash(n);				//重组储存,使得bucket_count>=n且bucket_count>size/max_load_factor
c.reserve(n);				//重组储存,使得c可以保存n个元素并且不用rehash.c.rehash(ceil(n/c.max_load_factor()))

        In addition, as a type of associative container, unordered containers have a series of operations of ordinary containers, but do not support position-related operations of sequential containers, such as push_front, push_back. The reason is that the elements in the associative container are stored according to the keywords, and these operations have no meaning for associative containers.

The following figure shows some common operations of associative containers:

 

3. An example using a hash table

        Given a string  s , find  the length of the longest substring  that does not contain repeated characters .

Example:

Input: s = "abcabcbb"
 Output: 3 
 Explanation: Because the longest substring without repeating characters is"abc",所以其长度为 3。

c++ implementation

class Solution {
public:
    int lengthOfLongestSubstring(string s) {
        int length = 0;
        int start = 0;
        int result = 0;
        unordered_map<char, int> hashtable;
        for (int i = 0; i<int(s.size()); i++) {
            auto j = hashtable.find(s[i]);
            if (j != hashtable.end() && j->second >=start) {
                start = hashtable[s[i]]+1;
                length = i-start;
            }
            
            length++;
            hashtable[s[i]] = i;
            result = max(length,result);
        }
 
        return result;
    }
};

Simply record your thoughts:

        First, define length to record the length of non-repeating strings, start to record the starting position of non-repeating strings (every time a repetition occurs, start is updated, length is cleared by i-start), and result is used for updating (take the largest length).

        The key idea is to store the character strings in the hash table in sequence according to the character-position relationship. Each time a number is stored, the length is ++. Once a repeated character is encountered, it can be found by find in the front. At this time, if find finds The value of is not equal to end and after start, update start and length, and finally find  the length of the longest substring  that does not contain repeated characters.

        

Guess you like

Origin blog.csdn.net/qq_43575504/article/details/130296923