Unordered_set based on hash table implementation of C++ container

1. Basic concepts

unordered_set is an unordered data set, that is, data is not stored in a particular order. This is a data container based on a hash table. A hash table is essentially an array. Unlike common arrays, the values ​​stored in a hash table are key-value pairs. A key-value pair means that a corresponding value can be obtained according to a key value. As for the key value, Baidu Encyclopedia's explanation is that " the key value (key) is a concept in the registry in Windows. The key value is located at the end of the registry structure chain, similar to the files in the file system, including the current computer and application programs used when executing The actual configuration information and data. The key value contains several data types to meet the usage requirements of different environments. " Get a corresponding value through a key value, which is somewhat similar to the mapping in advanced mathematics. A simple example is here to illustrate the concept.

Assumption : There is a Chinese dictionary that contains all the Chinese characters, but these Chinese characters are randomly typed in any order, so if you want to find a certain Chinese character in it, you need to check one by one from the beginning to the end. If you are unlucky, this The Chinese character is right at the end of the dictionary, so you need to traverse the entire dictionary to find the Chinese character you want to look up.

Optimization : Because there is a definite relationship between Chinese characters and pinyin, in order to improve the search speed, all Chinese characters are now sorted according to pinyin (key) (pinyin can be further sorted according to the first letter and the second letter), and every Each pinyin has a corresponding page number (index), starting from this page, the Chinese characters (value) corresponding to the pinyin are stored. So if you find the pinyin, you can also find the corresponding Chinese characters on the corresponding page number. Among them, there is a certain fixed mapping relationship between pinyin and page number, which can be calculated in a certain way (hash function).
From the above example, it can be seen that this container has considerable advantages in data search and container traversal, so it can be considered to use this container for search problems . In addition, the data stored in this container is unique, and this feature can be used to quickly check whether there are duplicate values ​​in a certain data sequence .

2. Usage

  1. Definition and initialization
// constructing unordered_sets
#include <iostream>
#include <string>
#include <unordered_set>

template<class T>
T cmerge (T a, T b) {
    
     T t(a); t.insert(b.begin(),b.end()); return t; }

int main ()
{
    
    
  std::unordered_set<std::string> first;                                // empty
  std::unordered_set<std::string> second ( {
    
    "red","green","blue"} );    // init list
  std::unordered_set<std::string> third ( {
    
    "orange","pink","yellow"} ); // init list
  std::unordered_set<std::string> fourth ( second );                    // copy
  std::unordered_set<std::string> fifth ( cmerge(third,fourth) );       // move
  std::unordered_set<std::string> sixth ( fifth.begin(), fifth.end() ); // range

  std::cout << "sixth contains:";
  for (const std::string& x: sixth) std::cout << " " << x;
  std::cout << std::endl;

  return 0;

output:

sixth contains: pink yellow red green orange blue
  1. Member method
    (1) begin(): Returns an iterator pointing to the first element ( iterator (iterable) is a super interface! It is an object that can traverse a collection, provides a common operation interface for various containers, and isolates the access to the container The traversal operation and the underlying implementation are thus decoupled ).
    prototype:
<1> container iterator (1)	
      iterator begin() noexcept;
      const_iterator begin() const noexcept;
<2> bucket iterator (2)	
      local_iterator begin ( size_type n );
      const_local_iterator begin ( size_type n ) const;

According to the function prototype, we can know that begin can return two types of iterators, among which iterator can change the value of the pointed element, and const_iterator cannot be changed, only it can be changed to point to other elements. That is, const_iterator can modify its own pointing, but cannot modify the value of the pointed position.
(2) end(): Returns an iterator pointing to the last element.
prototype:

<1> container iterator (1)	
      iterator end() noexcept;
      const_iterator end() const noexcept;
<2> bucket iterator (2)	
      local_iterator end (size_type n);
      const_local_iterator end (size_type n) const;

Examples of use of begin and end :

// unordered_set::begin/end example
#include <iostream>
#include <string>
#include <unordered_set>

int main ()
{
    
    
  std::unordered_set<std::string> myset =
  {
    
    "Mercury","Venus","Earth","Mars","Jupiter","Saturn","Uranus","Neptune"};

  std::cout << "myset contains:";
  for ( auto it = myset.begin(); it != myset.end(); ++it )
    std::cout << " " << *it;
  std::cout << std::endl;

  std::cout << "myset's buckets contain:\n";
  for ( unsigned i = 0; i < myset.bucket_count(); ++i) {
    
    
    std::cout << "bucket #" << i << " contains:";
    for ( auto local_it = myset.begin(i); local_it!= myset.end(i); ++local_it )
      std::cout << " " << *local_it;
    std::cout << std::endl;
  }

  return 0;
}

output :

myset contains: Venus Jupiter Neptune Mercury Earth Uranus Saturn Mars
myset's buckets contain:
bucket #0 contains:
bucket #1 contains: Venus
bucket #2 contains: Jupiter
bucket #3 contains: 
bucket #4 contains: Neptune Mercury
bucket #5 contains: 
bucket #6 contains: Earth
bucket #7 contains: Uranus Saturn
bucket #8 contains: Mars
bucket #9 contains: 
bucket #10 contains: 

(3) bucket(const key_type& k): returns the bucket number whose element value is k ( in an unordered_set, the elements will not be sorted in any order, but the elements are grouped into each slot by the hash value of the element value (Bucker, It can also be translated as "bucket"), so that each corresponding element can be quickly accessed through the element value (the average time consumption is O(1)). It is similar to looking up a Chinese character through the pinyin of the dictionary
Prototype:

size_type bucket ( const key_type& k ) const;

Example of use

// unordered_set::bucket
#include <iostream>
#include <string>
#include <unordered_set>

int main ()
{
    
    
  std::unordered_set<std::string> myset = {
    
    "water","sand","ice","foam"};

  for (const std::string& x: myset) {
    
    
    std::cout << x << " is in bucket #" << myset.bucket(x) << std::endl;
  }

  return 0;
}

output :

ice is in bucket #0
foam is in bucket #2
sand is in bucket #2
water is in bucket #4

(4) bucket_count(): Returns the number of buckets in the container
Method prototype:

size_type bucket_count() const noexcept;

Example of use

// unordered_set::bucket_count
#include <iostream>
#include <string>
#include <unordered_set>

int main ()
{
    
    
  std::unordered_set<std::string> myset =
  {
    
    "Mercury","Venus","Earth","Mars","Jupiter","Saturn","Uranus","Neptune"};

  unsigned n = myset.bucket_count();

  std::cout << "myset has " << n << " buckets.\n";

  for (unsigned i=0; i<n; ++i) {
    
    
    std::cout << "bucket #" << i << " contains:";
    for (auto it = myset.begin(i); it!=myset.end(i); ++it)
      std::cout << " " << *it;
    std::cout << "\n";
  }

  return 0;
}

output

myset has 11 buckets.
bucket #0 contains: 
bucket #1 contains: Venus
bucket #2 contains: Jupiter
bucket #3 contains: 
bucket #4 contains: Neptune Mercury
bucket #5 contains: 
bucket #6 contains: Earth
bucket #7 contains: Uranus Saturn
bucket #8 contains: Mars
bucket #9 contains: 
bucket #10 contains: 

(5) cbegin() and cend() have the same functions as begin() and cend(), but the return types are different. Both cbegin and cend return const_iterator.
Method prototype:

container iterator (1)	
          const_iterator cbegin() const noexcept;
bucket iterator (2)	
          const_local_iterator cbegin ( size_type n ) const;
container iterator (1)	
          const_iterator cend() const noexcept;
bucket iterator (2)	
          const_local_iterator cend ( size_type n ) const;

There are two return types here, one is the common iterator type (const_iterator), and the other is the const_local_iterator type. As the name implies, a local iterator is an iterator that returns the current bucket. An example of use is as follows:

// unordered_set::cbegin/cend example
#include <iostream>
#include <string>
#include <unordered_set>

int main ()
{
    
    
  std::unordered_set<std::string> myset =
  {
    
    "Mercury","Venus","Earth","Mars","Jupiter","Saturn","Uranus","Neptune"};

  std::cout << "myset contains:";
  for ( auto it = myset.cbegin(); it != myset.cend(); ++it )//这里的it是const_iterator类型
    std::cout << " " << *it;    // cannot modify *it
  std::cout << std::endl;

  std::cout << "myset's buckets contain:\n";
  for ( unsigned i = 0; i < myset.bucket_count(); ++i) {
    
    
    std::cout << "bucket #" << i << " contains:";
    for ( auto local_it = myset.cbegin(i); local_it!= myset.cend(i); ++local_it )//这里的local_it就是const_local_iterator类型
      std::cout << " " << *local_it;
    std::cout << std::endl;
  }

  return 0;
}

output

myset contains: Venus Jupiter Neptune Mercury Earth Uranus Saturn Mars
myset's buckets contain:
bucket #0 contains:
bucket #1 contains: Venus
bucket #2 contains: Jupiter
bucket #3 contains: 
bucket #4 contains: Neptune Mercury
bucket #5 contains: 
bucket #6 contains: Earth
bucket #7 contains: Uranus Saturn
bucket #8 contains: Mars
bucket #9 contains: 
bucket #10 contains: 

(6) clear() : Clear the data in the container. This method calls the container's destructor method ~unorder_set. It should be noted that using clear() does not clear the memory, but only clears the data stored in the container, that is, after using clear(), the number of elements in the container is 0, and the requested memory is not released . So how to release this memory? Blogger BOOM Zhao Chaochao summed up three methods for containers to release memory, which are directly posted here:
1. Method 1: Directly declare the same anonymous container type to exchange with the original container, and the anonymous container will be automatically destroyed;

vector( ).swap(num);

2. Method 2: Declare a temporary object first, and then exchange data with the target container:

vector temp; 
(temp).swap(num); 

The temporary object has not been initialized, its buffer size is 0, and there is no data. If data is exchanged with the target object, the buffer in the container num will be gone;

3. Method 3: Clear the memory of the target container first, and then use the swap function to exchange with the original container, namely:

num.clear( ); vector(num).swap(num);

Method prototype:

void clear() noexcept;

noexcept, it has two types of effects: noexcept specifier and noexcept operator. The specifier is to specify whether the function throws an exception, and the operator is to perform a compile-time check, and return true if the expression is declared not to throw any exception.
Example of use

// clearing unordered_set
#include <iostream>
#include <string>
#include <unordered_set>

int main ()
{
    
    
  std::unordered_set<std::string> myset =
    {
    
     "chair", "table", "lamp", "sofa" };

  std::cout << "myset contains:";
  for (const std::string& x: myset) std::cout << " " << x;
  std::cout << std::endl;

  myset.clear();
  myset.insert("bed");
  myset.insert("wardrobe");
  myset.insert("nightstand");

  std::cout << "myset contains:";
  for (const std::string& x: myset) std::cout << " " << x;
  std::cout << std::endl;

  return 0;
}

output

myset contains: sofa lamp table chair
myset contains: nightstand wardrobe bed

(7) count(const key_type& k): Count the number of elements whose value is k in the container. Since the elements stored in unordered_set are unique, this method will only return 0 or 1.
Method prototype:

size_type count ( const key_type& k ) const;

Example of use

// unordered_set::count
#include <iostream>
#include <string>
#include <unordered_set>

int main ()
{
    
    
  std::unordered_set<std::string> myset = {
    
     "hat", "umbrella", "suit" };

  for (auto& x: {
    
    "hat","sunglasses","suit","t-shirt"}) {
    
    
    if (myset.count(x)>0)
      std::cout << "myset has " << x << std::endl;
    else
      std::cout << "myset has no " << x << std::endl;
  }

  return 0;
}

output

myset has hat
myset has no sunglasses
myset has suit
myset has no t-shirt

(8) emplace(Args&&... args): When there is no args element in the container, insert the data args into the container and return the iterator of the element and a True variable. Returns an iterator to the element and a False variable if the element already exists in the container.
Method prototype:

template <class... Args>
pair <iterator,bool> emplace ( Args&&... args );

Example of use

// unordered_set::emplace
#include <iostream>
#include <string>
#include <unordered_set>

int main ()
{
    
    
  std::unordered_set<std::string> myset;

  myset.emplace ("potatoes");
  myset.emplace ("milk");
  myset.emplace ("flour");

  std::cout << "myset contains:";
  for (const std::string& x: myset) std::cout << " " << x;

  std::cout << std::endl;
  return 0;
}

output

myset contains: potatoes flour milk

Guess you like

Origin blog.csdn.net/yyl80/article/details/123860099